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FOREWORD 


This  report  constitutes  the  results  of  two  workshops 
held  at  the  National  Bureau  of  Standards  to  survey  and  re- 
port on  the  major  technical  consequences  of  implementing  a 
three  schema  DBMS  architecture,  including  the  various  imple- 
mentation options  and  the  identification  of  any  necessary 
research. 


Workshop  1,  held  on  August  1-2,  1978,  investigated  the 
general  topic  of  data  independence.  The  participants  were 
Mr.  Charles  Bachman,  Dr.  Thomas  DeLutis,  Dr.  Rob  Gerritsen, 
Dr.  Eugene  Lowenthal,  Mr.  Frank  Manola,  Dr.  Alan  Merten,  Mr. 
Philip  Shaw,  Dr.  Diane  Smith,  and  Dr.  Gary  Sockut. 


Workshop  2,  held  on  August  22-23,  1978,  examined  sup- 
porting topics  such  as  query  languages,  data  dictionaries 
and  database  conversion.  The  participants  were  Dr.  Donald 
Chamberlin,  Dr.  Eric  demons,  Ms.  Nancy  Goguen,  Dr.  Henry 
Lefkovits,  Mr.  David  Shipman,  Dr.  Stanley  Su ,  Major  Anthony 
J.  Winkler,  and  Dr.  Carlo  Zaniolo. 


NBS  gratefully  acknowledges  the  assistance  of  the  fol- 
lowing individuals:  Dr.  Mani  Daya  contributed  many  ideas 
during  pre-meeting  planning  sessions  with  the  editors.  Dr. 
John  Smith  and  Dr.  Gary  Sockut  offered  comments  which  were 
included  in  the  report.  Dr.  Alan  Goldfine,  NBS,  played  an 
important  role  in  preparing  the  report  for  publication. 
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DATABASE  ARCHITECTURES: 
A  FEASIBILITY  WORKSHOP  REPORT 


John  L.   Berg , 
Marc  Graham, 
Kevin  Whitney, 
Ed  i tor s 


To  help  the  decision  maker  evaluate  the  po- 
tential benefits  and  pitfalls  in  moving  forward 
with  database  technology,  the  National  Bureau  of 
Standards  organized  two  workshops  whose  results 
are  presented  in  this  report.  The  workshops,  held 
in  August  1978,  explored  the  progress  plan  and 
potential  pitfalls  involved  in  specifying,  design- 
ing, and  implementing  systems  based  on  the 
ANSI/X3/SPARC  framework  and  the  CODASYL  JOD 
languages  specification.  Workshop  1  investigated 
the  general  topic  of  data  independence,  and 
Workshop  2  examined  supporting  topics  such  as 
query  languages,  data  dictionaries,  and  database 
conversion . 

Key  words:  Conversion;  Database;  Data-description; 
Data-dictionary;  Data-directory;  Data- 

manipulation;  DBMS;  Languages;  Query;  Standards. 
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1.  INTRODUCTION 


1.1     WORKSHOP  OBJECTIVES 


The  workshops  sought  to  explore  the  technical  feasibil- 
ity of  a  DBMS  architecture  with  a  high  degree  of  data  in- 
dependence using  the  ANSI/X3/SPARC  framework.  The  CODASYL 
78  JOD  languages  are  used,  where  possible,  to  insure  speci- 
ficity in  the  discussions.     Workshop  1  was  charged  with: 

1.  Specifying  criteria  for  determining  the  degree  of 
data  independence  in  any  DBMS  architecture  in  order 
to  determine  the  degree  of  independence  in  the  sub- 
ject architecture. 

2.  Providing  the  criteria  for  dividing  the  fun':tional 
components  of  a  DBMS  into  internal,  conceptual,  and 
external  schemas,  and  the  mappings  between  any  pair. 

3.  Identifying  problems  in  the  ANSI/SPARC  approach  to 
data  independence. 

4.  Assigning  the  CODASYL  specification  statement 
classes  to  the  schemas  of  the  ANSI/X3/SPARC  frame- 
work, using  the  criteria  for  data  independence 
developed  above. 

Workshop  2  was  charged  with  developing  answers  to  the  fol- 
lowing questions  in  order  to  facilitate  any  eventual  imple- 
mentation : 

1.  How  can  we  protect  the  user's  investment  in  existing 
databases  and  application  programs? 

2.  What  is  the  role  of  the  data  dictionary/directory  in 
preserving  the  user's  data  base  investment? 

3.  Is  there  anything  in  this  architecture  which  medi- 
ates against,  or  supports,  end  user  use  of  the  data- 
base? What  specifications  for  end  user  facilities 
should  be  produced? 

4.  Is  this  architecture  suited  to  the  coming  genera- 
tions of  database  environments:  distributed  data- 
bases, special  purpose  machines,  associative 
storage,  advances  in  programming  methodology,  and 
other  predictable  technological  advances? 
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1.2  MOTIVATION 


1.2.1  Goals. 

To  help  the  decision  maker  evaluate  the  potential  bene- 
fits and  pitfalls  in  moving  forward  at  this  time  with  data- 
base technology,  the  National  Bureau  of  Standards  organized 
two  workshops  whose  results  are  presented  in  this  report. 
These  two  technology  feasibility  workshops  of  two  days  each 
brought  together  16  industry  and  academic  experts  to  explore 
the  progress  plan  and  potential  pitfalls  involved  in  speci- 
fying, designing,  and  implementing  database  management  sys- 
tem technology  based  on  the  ANSI/X3/SPARC  framework  and  the 
CODASYL  JOD  languages  specifications.  The  first  workshop 
was  charged  with  discussing  the  degree  of  data  independence 
provided  by  such  an  approach  and  the  acceptability  of  the 
CODASYL  languages  specifications  as  candidates  for  the  con- 
ceptual schema  and  external  schema  of  the  framework.  The 
second  workshop  concentrated  on  implementation- related  is-  || 
sues  such  as  the  role  of  the  data  dictionary,  the  supporting  j] 
machine  environment,  and  the  distributed  databases.  The  0 
results  of  these  workshops  are  recorded   in  this  report. 

1.2.2  Accomplishments. 

The  main  product  of  the  workshops  is  this  report.  It 
was  edited  from  the  notes,  wall  charts,  and  transcribed 
proceedings  of  the  workshops.  Whenever  the  editors  felt 
there  was  substantial  concurrence  on  a  topic,  they  included 
it  in  the  report.  Where  there  were  significant  matters  of 
particular  interest  to  an  individual  participant  in  the 
workshops,  that  person  wrote  a  position  paragraph  which  is 
included  in  this  report  and  attributed  directly  to  its  au- 
thor .  This  is  indicated  in  the  text  by  the  participant's 
name,  which  appears  underlined  and  bracketed,  immediately 
following  the  position.  All  referenced  documents  are  indi- 
cated by  an  abbreviated  name  enclosed  in  brackets.  The  com- 
plete list  of  references  can  be  found  in  Appendix  A.  The 
goal  of  the  editors  of  this  report  was  mainly  to  provide  an  j 
outline  in  which  these  position  paragraphs  would  fit. 

If  we  have  asked  the  right  questions,  provided  a  frame- 
work for  analysis,  and  stimulated  productive  discourse  on 
the  technical  feasibility  of  the  ANSI  framework  for  DBMS 
with  these  workshops  and  this  report,  our  goals  will  have 
been  met . 
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2.      DATA  INDEPENDENCE 


2.1  INTRODUCTION 


In  his  book  on  database  management  systems,  C.  J.  Date 
emphasizes  the  importance  of  data  independence  by  devoting  a 
separate  section  to  the  concept.  He  writes  that  an  applica- 
tion is  data-dependent  when 

...  knowledge  of  the  data  organization  and  ac- 
cess technique  is  built  into  the  application 
logic.  ...  In  a  database  system,  however,  it 
would  be  extremely  undesirable  to  allow  applica- 
tions to  be  data-dependent.  There  are  two  major 
reasons:  (1)  Different  applications  will  need 
different  views  of  the  same  data  ...  (2)  The 
DBA  must  have  the  freedom  to  change  the  storage 
structure  or  access  strategy  (or  both)  in 
response  to  changing  requirements,  without  the 
necessity  of  modifying  existing  applications. 
[DATE] 

While  the  goal  of  data  independence  is  to  isolate  changes  in 
the  database  from  changes  in  the  application  view,  a  more 
precise  definition  was  needed  to  permit  the  workshop  parti- 
cipants to  evaluate  the  degree  of  data  independence  provided 
by  the  ANSI/X3/SPARC  framework. 

A  few  published  definitions  will  show  the  great  diver- 
sity of  views  on  exactly  what  data  independence  means: 

Data  Independence  has  very  specific  properties, 
and  can  provide  very  specific  predictable  bene- 
fits ...  even  though  data  independence  is  so 
complex  a  phenomenon  that  it  approaches  confu- 
sion, it  is  possible  to  specify  data  indepen- 
dence functions  and  capabilities.  [ANS75] 

Data  Independence  is  concerned  with  the  problems 
of  separating  application  programs  from  some  as- 
pects of  the  storage  and  structure  in  the  data 
base  ...  for  the  protection  of  investment  in 
data  and  programs  in  a  changing  business  and 
computing  environment.  Thus,  "how  much  indepen- 
dence" is  an  economic  question  involving  trade- 
offs between  flexibility  and  efficiency.  [JARD] 

Data  Independence     is    "     ...     the     immunity  of 
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applications  to  change  in  storage  structure  and 
access  strategy."  [DATE] 

Stonebraker  attempted  to  introduce  a  rigorous  defini- 
tion of  data  independence  and  to  classify  database  transfor- 
mations by  the  degree  of  independence  provided,  but  little 
has  been  done  with  his  model  since.  The  issue  to  be  dis- 
cussed, then,  is  can  we  define  data  independence  in  a  pre- 
cise and  measurable  way  and,  if  so,  how  much  do  the  ANSI  ar- 
chitecture and  CODASYL  provide? 

Before  starting  into  discussions,  the  workshop  partici- 
pants prepared  the  following  list  of  key  terms  for  which 
standard  definitions  would  facilitate  their  work: 

Data  Independence 
Data 

Information 
Integrity 
Constraint 
Index 

Of  these  items,  the  workshop  specified  data  indepen- 
dence, the  (three  types  of)  schemas  and  mappings  as  most  im- 
portant for  their  discussions. 


Schema 

Conceptual 

Internal 

External 
Logical  data  structure 
Mapping 


2.2     DATA  INDEPENDENCE  DEFINED 


The     workshop    discussion     began     with       the  workshop 
leader's  prototypical  definition: 

Data  independence  protects  (a  user's)  investment 
in  databases  and  programs  by  insulating  the  user 
from  inevitable  changes  in  applications,  data, 
and  computer  systems. 

Although  this  is  a  statement  of  objectives  rather  than 
a     proper     definition,  the  workshop  participants  improved   it  , 
to : 

The  objective  of  data  independence  is  to  permit  ! 
the  use  of  information  in  a  changing  environ-  . 
ment.   [Smith]  I 

This  point  was  later  expanded   in  a  joint  position  paper ^ 
by  Diane  and  John  Smith:  ' 

i 


-6- 


The  term  "data  independence"  has  different, 
though  related,  meanings  when  applied  at  the 
CONCEPTUAL  level  and  the  EXTERNAL  level. 


i.  Meaning  at  the  CONCEPTUAL  level:  Data 
independence  is  achieved  when  the 
relevant  structure  of  the  enterprise  is 
revealed,  and  its  representation  over 
computer  storage  is  hidden.  It  is  not 
sufficient  merely  to  hide  representation 
details — this  would  imply  that  an  "emp- 
ty" conceptual  schema  would  be  an  ade- 
quate solution. 

ii.  Meaning  at  the  EXTERNAL  level:  Data  in- 
dependence is  achieved  when  the  relevant 
structure  of  the  enterprise  appears  to 
the  user  in  a  desirable  ex  ternal 
representation,  and  the  internal 
representation  over  computer  storage  is 
hidden.  It  is  possible  that  the  internal 
representation  will  be  chosen  to  reflect 
the  desired  external  representation. 
[Smiths] 

Data  Independence  is  the  property  of  a  data 
management  system  that  provides  alternate  views 
of  the  same  stored  data,  and  preserves  them  dur- 
ing the  evolution  of  the  data  environment. 
[Manola] 

The  objective  of  data  independence  is  to  permit 
the  continued  acquisition,  storage,  retrieval, 
and  dissemination  of  information  in  support  of 
the  operation  of  the  enterprise  over  time.  The 
effect  is  to  insulate  the  enterprise  from  the 
inevitable  changes  that  occur  in  applications, 
data,  computers,  and  the  enterprise's  view  of 
itself.   [De Lutis] 

These  attempts  to  agree  on  an  intensional  definition  (one 
which  designates  the  qualities  of  objects  to  which  it  ap- 
plies) having  failed,  the  group  turned  to  devising  an  exten- 
sional  definition  (one  which  designates  the  objects  to  which 
it  applies)  of  data  independence.  The  workshop  participants 
devised  this  extensional  definition  of  data  independence  by 
enumerating  the  classes  of  changes  permitted  by  a  database 
system  with  a  high  degree  of  data  independence.  The  follow- 
ing list  of  capabilities  as  relating  to  data  independence 
was  recorded  and  augmented  by  Bachman: 
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a.  changing  character  of  floating  point  number    or  in- 
teger representation, 

b.  changing  record  delimiter  mechanism, 

c.  changing  names  of  records,  sets,  and  items, 

d.  changing  location  within  record  of  item, 

e.  adding/subtracting  unreferenced  items, 

f.  changing   internal  format  of  items,  including  coding 
schema , 

g.  changing  precision  of  data  items  to  make    more  pre- 
cise , 

h.  changing  units  of  measure, 

i.  adding  new  record  and  set  types, 

j.     adding  consistency  and  derivation  declarations, 
k.     changing  data  model, 

1.     refining  conceptual  schema  declarations, 
m.     changing  primary  and  secondary    key     indexing  tech- 
niques, 

n.     changing  set  implementation  technique, 
o.     changing  location  of  records  from  one  site  to  anoth- 
er, 

p.     changing  number  of  records  or  specific  content  which 
represent  a  single  real  world  entity. 

Throughout  this  discussion  it  was  clear  that  various 
levels  of  data  independence  were  poss ible--correspond ing  to 
the  characteristics  of  the  data  that  could  be  changed 
between  storage  and  use.  Each  schema  is  a  description  of 
the  data,  with  emphasis  on  various  characteristics  of  the 
data  as  shown  in  the  following  table: 


ANSI  schema 

ex  ternal 
conceptual 

internal 


Data  characteristic 

format 

structure 

access 

representation 


DIAM  level 

end  user 
infolog  ical 
string 
encoding 
phys  ical 
dev  ice 


There  may  be  mappings  between  each  level,  with  a  degree 
of  data  independence  permitted  at  each  level.  Some  changes 
such  as  renaming  a  field  may  occur  at  one  or  more  levels, 
while  others  such  as  selecting  the  indexing  mode  occur  only 
at  one  level. 

Several  participants  used  a  model  like  this  one  to 
clarify  their  view  of  some  aspects  of  data  independence. 
Gerritsen  argued,  for  example,  in  the  following  position 
paragraph,  that  machine  independence  is  not  a  concern  of 
DBMS  data  independence: 
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Computer  independence  should  be  considered 
separately  from  the  overall  question  of  data  in- 
dependence because: 

1.  Change  in  computer  and/or  DBMS  occurs 
much  less  frequently  than  changes  in 
data,  applications,  and  enterprise 
model . 

2.  Insulating  programs  from  changes  in  com- 
puter transcends  the  capability  of  the 
DBMS.  Examples  of  changes  from  computer 
to  computer  that  affect  programs  but 
cannot  be  controlled  by  the  DBMS  in- 
clude: 

a)  collating  sequence 

b)  number  of  distinct  characters 

c)  number  of    characters    stored  per 
word 

d)  precision  of  arithmetic. 

3.  A  large  degree  of  computer  independence 
can  be  attained  by  means  that  have  no 
relation  to  the  means  used  to  obtain  in- 
dependence from  changes  in  applications, 
data,  and  the  enterprise  model.  Comput- 
er independence  can  be  attained  simply 
by  implementation  of  the  DBMS  on  a 
variety  of  computers.  Examples  of  DBMS 
that  are  currently  available  on  a  large 
variety  of  computers  are  TOTAL  and  SEED. 
[Gerr i tsen] 

Using  examples  from  his  paper  on  database  reorganiza- 
tion, Sockut  divided  database  changes  into  four  DIAM  levels 
as  follows : 

Changes  at     the     infological     level  (conceptual 
schema)  : 

attributes  can  be  added,     deleted,  com- 
bined, split,  or  renamed 

relationships  can  be  created,  destroyed, 
or  renamed 

migrating  an  attribute  in  a  1-N  rela- 
tionship from  the  1  to  the  N  or  vice 
versa 
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-  changing  among  1-1,  1-N,  and  M-N  rela- 
tionships. 

Changes  at  the  string  level  (internal  schema): 

-  creating,  destroying,  or  renaming  a 
string  ( record  or  set) 

rearranging     fields    on    a      record  or 
records  in  a  set 

-  establishing  or  removing  a  secondary  in- 
dex, or  a  search  key  within  a  record 
type  that  is  densely  indexed  on  that  key 

-  changing  the  order  of  a  set's  members. 

Changes  at  the  encoding  level  (internal  schema): 

modifying  the  basic  representation, 
scale,  encryption,  size,  precision, 
character  code,  etc.,  of  an  attribute 
encoding 

-  modifying  the  relationship  encoding  as, 
for  example,  changing  set  implementation 
among  embedded  chains,  pointer  arrays, 
and  bit  maps. 

Changes  at  the  physical  device     level  (internal 
schema) : 

-  changing  access  methods 

changing  hash  parameters 

remapping  areas  to  devices 

eliminating  overflow.  [Sockut] 

The  focus  on  levels  of  data  description    and     the  map- 
pings   between  them  clearly  delineates  the  major  distinction, 
between  a  two  schema  and  a  three  schema  DBMS  framework.  In 
a     system     using  only  two  schemas,  one  for  representing  data 
as  stored  and  another  for  representing  data  as     used,  there' 
will  be  N  X  M  mappings  for  N  storage  views  and  M  application! 
views.     When  a  storage  view  changes,  M  mappings  must  change; 
when    an    application    view  changes,  N  mappings  must  change. | 
By  contrast,  a  three  schema  system  has  a    single  conceptual, 
view    separating     the    storage    views     from     the  application] 
views.     In  this  case,  only  one  mapping  changes  if  a  storage 
view  or  an  application  view  changes. 
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At  run  time  these  two  system  organizations  produce  the 
same  database  manipulations,  so  they  provide  an  identical 
degree  of  data  independence.  Thus,  the  important  question 
is  not  the  theoretical  degree  of  data  independence  (complex- 
ity of  mappings  from  storage  to  usage),  but  the  practicality 
of  providing  that  degree  of  data  independence.  A  capability 
which  is  so  inconvenient  that  it  is  never  used  provides  no 
greater  data  independence  than  one  which  is  theoretically 
nonexistent.  In  this  sense,  the  three  schema  approach  pro- 
vides much  more  data  independence,  since  a  change  requires 
changing  only  a  single  mapping  rather  than  N  storage  or  M 
usage  mappings. 

This  distinction  between  theoretical  and  actual  data 
independence  is  clearly  expressed   in  Lowenthal's  definition: 

Data  Independence  is  a  property  of  the  program 
interface  provided  by  a  EMS;  namely,  it  is  a 
(necessarily  qualitative)  measure  of  the  extent 
to  which  changes  in  the  implementation  of  the 
interface,  and  specifically  changes  in  the  or- 
ganization of  the  data,  can  be  made  without  the 
requirement  to  modify  programs. 

Potential  data  independence  more  precisely 
refers  to  the  changes  which  could  be  made.  This 
can  be  determined  from  the  interface  itself. 

Actual  data  independence  refers  to  those  aspects 
of  the  potential  data  independence  which  are  ac- 
tually supported  by  the  EMS — i.e.,  the  existence 
of  mechanisms  to  1)  change  the  data  and  2)  con- 
tinue to  preserve  the  program's  view  of  the 
data . 

As  with  other  measures  of  user  interfaces,  such 
as  "  high- levelness ,"  "ease-of-u  se  ,"  etc.,  data 
independence  is  not  absolute,  but  must  be  relat- 
ed to  the  needs  of  the  enterprise.  [Lowenthal] 


2.3     THE  THREE  SCHEMA  FRAMEWORK 


2.3.1  ANSI  Schemas  and  Their  Mappings. 

After  discussing  the  aspects  of  data  independence,  the 
workshop  proceeded  to  investigate  the  ANSI  framework  and 
specifications  for  its  schemas  and  the  mappings  between 
them.  These  schema  definitions  and  mappings  provide  the 
measure  of  data  independence  being  sought. 
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Figure  1  shows  the  subset  of  the  ANSI/X3/SPARC  archi- 
tecture of  primary  concern  to  our  discussion.  Only  those 
interfaces  of  direct  interest  to  the  data  independence 
workshop  are  shown.     These  are: 

1)  Jonceptual  Data  Description 

4)  External  Data  Description 

7)  External  Data  Manipulation  Language 

13)  Internal  Data  Description 

36)  Internal/Conceptual  Data  Transformation 

38)  Conceptual/External  Data  Transformation 

The  internal,  conceptual,  and  external  schemas  are  de- 
fined in  the  ANSI/X3/SPARC  documents  [ANS75],  [ANS77].  In 
general  terms,  the  conceptual  schema  represents  a  long-term 
view  of  the  enterprise  independent  of  its  databases  or  in- 
formation processing  applications.  The  internal  schema  is  a 
description  of  the  organization's  databases.  An  external 
schema  is  a  description  of  the  data  used  by  a  collection  of 
application  programs  of  the  enterprise.  Logical  data  ele- 
ments of  an  external  schema  need  not  correspond  in  a  one- 
to-one  way  with  physical  data  fields  described  in  the  inter- 
nal schema.  The  values  of  external  data  items  may  be  trans- 
lations, transformations,  concatenations,  logical  or  arith- 
metic computations,  or  other  algorithms  performed  on  one  or 
more  internal  data  values. 
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F  ig  ur  e  1 

ANSI/SPARC  DBMS  Study  Group  Architecture 
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2.3.2  The  Conceptual  Schema. 


Some  criteria  suggested  by  workshop  participants  for 
permitting  a  language  element  to  be  in  the  conceptual  schema 
or  to  be  excluded  from  it  were  the  following: 

-    The  function  declared  by  the  language  element  should 
be  appropriate  for  the  schema 

This  type  of  language  element  is  required  for  the 
schema 

Each  function  should  be  described  in  only  one  schema 
or  part  of  a  schema 

There  should  be  only  one  way  of  representing  rela- 
tionships in  the  conceptual  schema 

Descriptions  in  the  schemas  should  not  duplicate  one 
another 

Nothing  in  the  conceptual  schema  is  ignorable;  each 
declaration  must  have  effect  or  import  to  the  enter- 
prise administrator 

The  conceptual  schema  factors  repetitive  declara- 
tions out  of  the  internal  and  external  schemas. 

Smith  commented  on  what  should  be  specified  at  the  con- 
ceptual level: 

The  conceptual  schema  is  intended  to  capture  all 
aspects  of  an  enterprise  necessary  to  provide 
support  information  for  its  operation.  Thus, 
both  the  structural  and  behav ioral  aspects  of 
the  enterprise  must  be  specified. 

Structural  aspects  consist  of  the  entities , 
attributes ,  categories ,  and  relationships 
comprising  the  enterprise  as  well  as  the  naming 
mechanisms  used  to  reference  them.  Important 
naming  mechanisms  are:  associative  naming  (e.g., 
the  employee  with  ID#  =  99999);  operative  naming 
(e.g.,  the  employee  last  hired);  functional  nam- 
ing (e.g.,  the  number  "2+3");  and  relationship 
naming  (e.g.,  the  employee  who  is  assigned  to 
project  Q  .  It  is  of  utmost  importance  to  note 
that  it  is  unlikely  a  given  structural  component 
is  interpreted  in  the  same  way  throughout  the 
enterprise.  For  example,  in  a  hotel  the  object 
"reservation"  may  be  interpreted  by  some  class 
of  users  as  an  entity    with     attributes  "date," 
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"reservee,"  and  "room."  A  second  class  of  users 
may  interpret  "reservation"  as  a  relationship 
among  the  entities  "reservee,"  "date,"  and 
"room,"  where  "reservee"  has  attributes  "name," 
"address,"  etc.  Another  class  of  users  may  view 
a  "hotel  reservation"  as  one  subcategory  of 
"reservation"  among  many  such  as  "plane  reserva- 
tion," "library  book  reservation,"  etc.  Thus, 
the  conceptual  schema  must  specify  the  structure 
in  a  way  that  permits  all  these  different  in- 
terpretations to  be  captured. 

The  behavioral  aspects  of  an  enterprise  consist 
of  the  operations  performed  within  an  enter- 
prise. Such  operations  are  reflected  in  the  in- 
formation system  as  insertion,  deletion,  and  up- 
date operations  applied  to  its  structural  as- 
pects. It  is  important  to  note  that  the  effects 
of  such  operations  are  rarely  restricted  to  iso- 
lated objects  in  the  structure.  Rather,  they 
have  side  effects  that  ripple  over  related  ob- 
jects. For  example,  in  a  personnel  application 
the  termination  of  an  employee  would  require  not 
only  his  deletion  from  the  system,  but  also  his 
removal  from  health  plans,  car  pools,  and  pro- 
ject assignments.  It  would  probably  also  re- 
quire new  assignments  being  made  to  fill  some  of 
these  newly  emptied  slots.  These  side  effects 
must  also  be  captured  in  the  conceptual 
schema--either  as  an  aspect  of  an  operation  or 
as  an  integrity  constraint  specified  separately 
from  the  operation.  [Smith] 


2.3.3  The  External  Schemas . 


Workshop  participants  seemed  generally  in  agreement  on 
the  nature  of  external  schemas,  making  such  comments  as: 

-  The  CODASYL  subschema  is  a  proper  example  of  an 
external  schema 

-  The  external  schemas  will  be  allowed  to  use  data 
models  convenient  to  the  user's  applications  and 
programming  languages 

Multiple  concurrent  external  schemas  are  needed  to 
isolate  one  user's  view  from  another  user" s  view  of 
the  database.  For  example,  record  order  and  struc- 
ture should  be  variable  from  one  external  schema  to 
another 
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The  external  schema  is  the  proper  level  of  granular- 
ity for  access  control. 

2,3.4  The  Internal  Schema. 

The  workshop  also  discussed  the  internal  schema  brief- 
ly, with  comments  such  as: 

The  CODASYL  DDL  and  DSDL  form  a  proper  example  of  a 
possible  internal  schema: 

-    When  a  set  expresses  the  binding  of  a  relationship, 
it  is  properly  part  of  an  internal  schema 

Similarly,  when  a  set  expresses  an  access  path,  it 
belongs  in  the  internal  schema. 

Only  one  position  paragraph  was  written  on  the  internal 
schema,  arguing  that  its  existence  need  not  be  known  by  the 
users  of  a  data  management  system  or  by  the  enterprise  ad- 
ministrator: 

Given  a  conceptual  schema,  a  set  of  external 
schemas,  data  volumes,  access  frequencies,  logi- 
cal access  path  frequencies,  storage  device 
characteristics,  performance  requirements,  and 
other  performance/ storage  constraints/ costs , 
then  it  is  possible  to  represent  internal  schema 
design  decisions  in  a  mathematical  optimization 
model.  Such  a  model  can  perhaps  be  solved  "op- 
timally" or,  through  heuristics,  be  used  to  find 
"good"  solutions.  Such  a  model  can  be  incor- 
porated in  the  DBMS.  Since  the  DBMS  only 
creates,  modifies,  and  uses  the  internal  schema, 
the  user  need  not  know  of  its  existence.  Hence, 
except  for  internal  requirements  of  the  DBMS, 
the  concept  of  " internal  schema"  can  be  estimat- 
ed .  [Gerritsen] 


2.4     USING  DDLC  JOD78   IN  THE  ANSI  FRAMEWORK 


Rather  than  allocating  DDLC  language  elements  to  the 
three  schemas,  the  workshop  participants  grouped  all  data 
description  language  elements  into  the  conceptual  schema. 
Then,  those  relating  to  the  storage  and  efficient  retrieval 
of  the  data  were  moved  to  the  internal  schema,  and  those  re- 
lating to  user  views,  applications,  and  programming 
languages  were  moved  to  the  external  schemas. 
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Smith  made  the  following  statement  about  the  inclusion 
of  operations  in  the  conceptual  schema: 

The  conceptual  schema  should  describe  more  than 
just  the  structure  of  an  enterprise.  It  is 
equally  important  to  describe  its  behav  ior  as 
characterized  by  the  operations  that  effect 
changes  within  it.  Such  operations  can  be  de- 
fined over  the  components  of  the  structure 
without  introducing  any  implementation  detail. 
Each  operation  should  be  specified  in  terms  of 
basic  insert,  delete,  and  update  operators, 
high-level  naming  mechanisms,  and  we  11- conceived 
control  structures.  The  use  of  appropriate  nam- 
ing mechanisms  makes  it  possible  to  refer  to  ob- 
jects without  considering  access  paths  and  other 
representation  detail.  Control  structures  are 
necessary  to  define  the  scope  of  the  objects  af- 
fected and  could  take  the  form  of  quantifiers  or 
more  conventional  constructs  such  as  recursion 
and  iteration. 

Such  an  extended  specification  provides  several 
advantages : 

i.  it  permits  users  to  understand  how 
change  is  effected   in  the  enterprise, 

ii.  it  provides  implementors  with  a  basis 
for  implementation  optimization, 

iii.  it  provides  DBAs  with  a  basis  for  veri- 
fying that  high-level  integrity  con- 
straints will  be  maintained, 

iv ,  it  provides  a  basis  for  access 
control--if  all  accesses  are  channeled 
through  only  the  specified  operators. 
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This  discussion  of  which  data  description  language 
statements  should  belong  in  the  conceptual  schema  was  organ- 
ized according  to  the  DDLC ' s  nine  categories  of  language 
elements.  A  summary  of  the  allocation  of  language  elements 
is  shown  in  Table  1,  which  should  be  interpreted  according 
to  the  following  two  paragraphs: 

The  schema  category  identifies  a  schema  and  names  its 
characteristics.  Its  syntactic  elements  include  the  schema 
access- control  clause,  the  schema  CALL  clause,  and  the  SCHE- 
MA NAME  clause.     The  schema  category  inspired  no  comment. 

The  structure  category  names  the  data  structures  that 
the  schema  describes.  The  syntactic  elements  of  this  ca- 
tegory include  the  Data-name  clause,  the  KEY  clause,  the 
OWNER  and  MEMBER  clauses,  the  SET-NAME  and  RECORD  NAME 
clauses,  the  OCCURS  clause,  and  the  ORDER  clause. 


Discussion  of  the  structure  category  focused  on  the 
adequacy  of  the  CODASYL  set  for  defining  the  relationships 
among  entities  described  by  the  conceptual  schema.  Several 
participants  objected  to  the  multiple  uses  of  the  CODASYL 
set  construct,  as  explained  in  the  following  position  para- 
graphs. Smith  objects  to  the  use  of  SET  to  show  access  path 
information. 
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Table  1 

DDLC  Allocation  of  Language  Elements 


LANGUAGE  ELEMENT 
CATEGORY 

ASPECTS  INCLUDED  IN  SCHEMAS 

INTERNAL 

CONCEPTUAL 

EXTERNAL 

■ 

1.  SCHEMA 

X 

y 

NAME,  CALL, 

ACCESS  CONTROL 

2.  STRUCTURE 

NAMES 

X 

X 

X 

KEYS 

PERFORMANCE 

ACCESS 

SETS,  GROUPS 

ACCESS  PATHS 

RELATIONSHIPS 

ORDER 

PERFORMANCE 

SEMANTIC 

PRESENTATION 

MAPPING 

PLACEMENT 

REPRESENTATION 

3.  VALIDATION 

CHECK  PICTURE  TYPE 

REPRESENTATION 

DOMAIN 

PRESENTATION 

IDENTIFIES,  INSERTION 

X 

CONSTRAINTS,  SOS 

X 

SOURCE/RESULT 

X 

4.  DML  INTERFACE 

v 

A. 

5  ACCESS  CONTROL 

X 

X 

6.  MEASUREMENT 

7.  TUNING 

X 

X 

X 

8.  RESOURCE 

ALLOCATION 

X 

9.  ADMINISTRATION 

X 

X 

X 
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Certainly  the  SET  and  RECORD  statements  can  be 
used  at  the  conceptual  level  to  specify  rela- 
tionships between  objects.  However,  in  prac- 
tice, these  statements  are  frequently  used  to 
specify  the  access  paths  that  are  to  support  the 
relationships.  For  example,  consider  the  fol- 
lowing schema  and  three  CODASYL  descriptions: 


Figure  2 
Enrollment  Example 


STUDENT 


ENROLLMENT 


CLASS 


GRADE 


B 


RECORD 

Enrollment 

05 

Name 

05 

Class  # 

05 

Grade  # 

RECORD 

Enrollment 

05 

Grade  # 

RECORD 

Student 

05 

Name 

RECORD 

Class 

05 

Class  # 

SET 

SE 

OWNER 

Student 

MEMBER 

Enrollment 

SET 

CE 

OWNER 

Class 

MEMBER 

Enrollment 

RECORD 

Enrollment 

05 

Name 

/*  Flat  File  Model  */ 


/*   CODASYL  Model  */ 
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05 

Class 

# 

05 

Grade 

# 

RECORD 

Student 

05 

Name 

RECORD 

Class 

05 

Class 

RECORD 

Grade 

05 

Grade 

# 

This  structure  implies  a  large  number  of  access 
paths,  many  of  which  are  redundant. 

Conceptually y  each  of  the  CODASYL  descriptions 
permits  the  same  information  to  be 
ex tracted--but  using  different  access  paths. 
Thus,  RECORD  and  SET  are  good  constructs  for 
specifying  access  paths--an  internal  schema  pro- 
cess. This  freedom  to  specify  the  enterprise  in 
multiple  ways,  employed  at  the  conceptual  level, 
would  yield  multiple  specifications  of  the  en- 
terprise. This  requires  a  schema  reconciliation 
step  that  could  be  avoided  either  by  disciplin- 
ing the  use  of  constructs  or  by  providing  a  sin- 
gle construct  that  does  only  what  is  wanted. 
The  second  option  is  usually  more  practical  to 
implement.  [Smith] 

Sockut  amplifies  this  point,  and  also  has  the  same  com- 
plaint about  the  OCCURS  clause: 

The  CODASYL  Set  is  inadequate  as  a  mechanism 
for  describing  relationships  in  the  conceptual 
schema  because: 

1.  The  only  way  to  specify  a  relationship 
is  to  define  a  SET  or  OCCURS,  which 
unfortunately  specify  the 
relationship's  implementation  as  well. 
It  would  be  better  to  separate  rela- 
tionship definition  (C- level)  from  re- 
lationship implementation  definition 
(I-level  and  possible  E-level  as 
well)  . 

2.  SET  sometimes  means  both  a  relation- 
ship and  an  access  path  ( if  it  is  a 
non-singular  set)  ,  and  sometimes  it 
means  only  the  access  path  ( if  it  is  a 
singular  set)  . 

Here  is  my  view  of  the  ideal: 
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1,  Conceptual  schema  level:  Specify  re- 
lationship including  its  multiplicity 
(i.e.,   1-1  f   1-many  or  many-many). 

2.  Internal  schema  level  (and  possibly 
external  schema  level):  specify  im- 
plementation of  relationship  (e.g., 
SET  vs.  OCCURS).  Also  specify  access 
paths  which  do  not  implement  relation- 
ships (e.g.,  a  SET  with  OWNER=SYSTEM) . 
[Sockut] 

Lowenthal  also  comments: 

CODASYL  DDL  provides  two  ways  of  defining  1-N 
■  relationships--sets      and      nested  repeating 

groups.  At  the  conceptual  level,  there  should 
be  only  one  way  of  expressing  a  relationship; 
linked  list  (set)  vs.  contiguity  (repeating 
group)  should  be  an  internal  consideration. 
[Lowenthal ] 

Since  the  OCCURS  clause  affects  the  data  storage  struc- 
ture, it  should  be  moved  to  the  internal  schema  from  the 
conceptual.  If  an  external  schema  is  used  with  a  program- 
ming language  which  supports  repeating  groups,  then  OCCURS 
may  also  be  used  there. 

The  OCCURS  clause  affects  both  the  internal  and 
external  schemas.  There  seems  to  be  agreement 
that  OCCURS  affects  the  external.  It  affects 
the  internal  because  it  controls  implementation 
of  a  relationship  through  physical  contiguity 
rather  than  with  pointers.  The  presence  of  the 
OCCURS  in  the  DDL  as  well  as  in  the  COBOL 
subschema  DDL  is  a  clear  indication  that  CODASYL 
intends  for  the  presence  of  OCCURS  to  affect  the 
storage  structure.  [Gerritsen] 

There  was  considerable  discussion  of  the  ORDER  clause 
and  its  roles.  Performance  aspects  belong  in  the  internal 
schema,  semantic  aspects  belong  in  the  conceptual  schema, 
and  presentation  or  display  aspects  belong  in  the  external 
schemas.  The  following  comment  argues  for  excluding  ORDER 
from  the  conceptual  schema: 

The  effect  of  supporting  ORDER  as  a  reasonable 
conceptual  schema  attribute  has  a  negative  ef- 
fect on  data  independence  ( in  the  CODASYL  case) 
because  a  given  set  type  can  have  only  one  ORDER 
clause  in  the  CODASYL  DDL.  A  set  may  have,  in 
the     real     world,  several  valid  ordering s-- i .e . , 
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different  users  perceive  different  orderings. 

If  the  enterprise  administrator  changes  his  mind 
about  which  mechanism  is  to  be  used  to  express 
an  ordering,  some  programs  will  have  to  be 
changed.  Contrast  this  with  the  relational  ap- 
proach where  all  orderings  are  item  based  and 
all  are  symmetric.     [ Lowenthal ] 

Smith,  however,  notes  an  important  exception  to  this 
point: 

When  order  is  used  to  improve  the  performance  of 
a  database  management  system  (DBMS) ,  then  it 
should  be  considered  an  implementation  mechanism 
and  specified  in  the  internal  schema.  However, 
when  an  ordering  is  used  within  an  enterpr ise 
(to  facilitate  either  communication  or  its  ac- 
tivities), this  should  be  reflected  in  the  con- 
ceptual schema.  Typically  queue  and  stack  dis- 
ciplines may  be  used  to  service  customers.  If 
the  service  operators  are  included  in  the  con- 
ceptual schema  (see  position  paper  by  D.  Smith 
on  the  inclusion  of  operators  in  the  conceptual 
schema) ,  then  the  notion  of  order  is  necessary 
to  utilize  these  operators  as  intended.  This 
capability  extends  the  applicability  of  DBMSs 
beyond  their  current  use  as  record  keeping  sys- 
tems to  the  more  dynamic  situations  inherent  in 
control  applications.     [ Smith] 

Another  DDLC  structure  clause  which  caused  debate  was 
the  KEY  clause  which  defines  entry  points  to  the  database. 
The  primary  objection  is  expressed  in  this  position  para- 
graph : 

The  "KEY  -  IS"  clause  in  CODASYL  1978  detracts 
from  data  independence  objectives  in  that  the 
intent  of  the  clause  is  to  define  an  "entry"  to 
the  record  type  from  the  DML.  The  effect  is  to 
bind  the  DML  to  the  DDL.  E.g.,  if  I  modify  the 
schema  to  delete  the  record  key  attribute  from 
an  item,  then  all  programs  which  used  the  item 
as  an  entry  point  will  now  be  invalid — the  DML 
processor  will  reject  the  FIND.  [Lowenthal] 

The  val id at ion  category  declares  rules  that  constrain 
occurrences  of  the  data  structures  declared  in  the  structure 
category.  The  syntactic  elements  of  this  category  include 
the  CHECK  and  DUPLICATES  clauses,  the  record  IDENTIFIER 
clause,  the  TYPE  and  PICTURE  clauses. 
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Data  validation  was  generally  agreed  to  be  a  function 
of  the  conceptual  data  description  except  for  the  following 
statements  and  clauses: 

The  CHECK  clause  specifying  range  restrictions  of  a 
data  element  may  also  appear  in  an  external  schema 
as  long  as  the  constraint  in  the  external  schema  is 
more  restrictive  than  the  constraint  in  the  concep- 
tual schema. 

-  Those  aspects  of  the  PICTURE  and  TYPE  clauses  that 
express  the  precision  or  domain  of  data  elements  be- 
long in  the  conceptual  schema  since  they  specify  se- 
mantic information  about  the  data  items  in  the  data- 
base. Those  aspects  which  express  the  format  and 
presentation  of  the  data  items  belong  in  the  exter- 
nal schemas. 

The  structural  constraint  clause  is  a  way  of  speci- 
fying a  relationship  rather  than  a  data  value  con- 
straint. In  this  sense,  it  more  properly  belongs  in 
the  structural  category  than  in  the  validation  ca- 
tegory. 

The  DML  interface  category  declares  procedures  which 
may  be  involved  by  a  DML  function  and  parameters  to  be  sup- 
plied in  these  procedures.  The  syntactic  elements  of  this 
category  include  the  RANGE  KEY  clause,  the  set  occurrence 
SELECTION  clause,  and  the  WITHIN  area  clause.  This  was  the 
only  time  in  the  discussion  that  the  interrelationships  of 
the  DDL  and  the  DML  were  explicitly  addressed.  Lowenthal 
commented  in  two  position  paragraphs: 

We  have  ignored  the  effect  the  DML  has  on  data 
independence  characteristics--rem ember  my  defin- 
ition of  data  independence  refers  to  the  "pro- 
gram interface"  which  includes  the  EML.  For  ex- 
ample, consider  the  N-1  relationship: 
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Figure  3 
An  N-1  Relationship 


DEPT.  # 
LOCATION 

EMPLOYEE  # 

DEPT.  #  (FOR  RELATIONAL) 
SALARY 


The  query  expressed  in  S2000  query  language  is: 

WHERE  salary  GT  $10,000  AND 
location  EQ  plantQ 

and  the  same  query    expressed     in    a  relational 
calculus  language  is: 

WHERE  dept .location  EQ  plantQ 
AND  THERE  EXISTS     employee  : 

dept.dept#  EQ  employee  .dept# 
AND    salary    GT  $10,000 

Note  that  the  S2000  statement  refers  only  to 
items  (domains)  and  not  records  (relations). 
Therefore,  the  statement  is  independent  of 
whether  LOCATION  is  in  the  department  record  or 
in  the  employee  record.  This  is  not  true  in  the 
relational  EML  example--it  is  assumed  that  LOCA- 
TION is  in  DEPT.  If  LOCATION  moves  to  EMPLOYEE 
then  either  the  statement  is  invalid  or  else 
there  must  be  a  mapping  which  allows  the  exter- 
nal view  to  see  LOCATION  in  DEPT — I 'm  not  sure 
this  is  definable.  In  other  words,  a  EML  which 
minimizes  references  to  record  names  supports 
item  migration  as  an  aspect  of  data  indepen- 
dence. (Of  course,  the  CODASYL  DML  also  refers 
explicitly  to  record  names).  [Lowenthal] 

The  access  control  category  declares  authorization 
mechanisms  for  access  to  and  change  to  the  occurrences  of 
the  data  structures  declared  in  structure  category.  The 
syntactic    elements    of     this    category    include  the  ACCESS- 
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CONTROL  clauses. 


Time  did  not  permit  the  workshop  participants  to  ex- 
plore this  subject  fully.  The  following  points  were  made 
and  Manola  contributed  a  more  detailed  position  paragraph: 


Access  control  is  actually  a  property  of  the  map- 
pings between  schemas,  rather  than  the  schemas  them- 
selves . 

-  Access  control  does  not  affect  the  actual  data  in- 
dependence, but  may  have  an  effect  on  the  practical 
data  independence  of  a  data  management  system. 

The  access  control  information  belongs  partly  in  the 
conceptual  and  partly  in  the  external  schemas. 

The  proper  mechanism  for  access  control  at  the 
external  level  is  the  external  schema  ( CODASYL 
subschema).  That  is,  each  distinct  set  of  user's 
access  rights  to  the  database  should  be  defined  in  a 
separate  subschema.  Particular  subschema  have 
identical  access  rights  designated  for  that  subsche- 
ma . 

Manola  comments: 

Access  control  functions  go  into  the  conceptual 
schema  (as  well  as  in  other  schemas)  for  the 
following  reasons: 

1.  Access  control  rules  ( like  integrity 
rules)  are  part  of  the  description  of 
the  enterprise  the  conceptual  schema  is 
supposed  to  model.  Generally  speaking, 
no  one  objects  to  the  idea  that  custo- 
mers, suppliers,  and  products  are  enti- 
ties of  interest  to  certain  types  of 
conceptual  schemas,  and  should  be  de- 
fined in  them.  Moreover,  few  people  ob- 
ject to  the  idea  that,  if  there  are 
rules  of  various  kinds  about  which  sup- 
pliers supply  which  products,  or  which 
products  are  shipped  to  which  customers, 
these  sorts  of  rules  legitimately  belong 
in  the  conceptual  schema  (I  can  imagine 
a  EXDD  rule  that  says  we  don' t  ship 
weapon  x  to  country  y) .  Certainly  pro- 
ducts of  various  kinds  (as  well  as 
plants,  warehouses,  etc.)  are  enterprise 
resources  that  we  need  to  keep  track  of. 
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and  may  well  be  described  in  the  concep- 
tual schema.  Well,  the  database  of  an 
enterprise  is  an  enterprise  resource 
which  we  need  to  keep  track  of  (or  so 
database  people  have  been  preaching  for 
years)  ,  and  which  could  well  be 
described  in  the  conceptual  schema. 
Moreover,  users  of  that  database  system 
could  also  be  viewed  as  entities  of  in- 
terest, and  the  semantically  meaningful 
relationships  between  these  users  and 
the  database  (plus  the  definition  of  the 
properties  of  legitimate  users  itself) 
describe  in  some  sense  the  security 
rules  of  the  system.  That  the 
ANSI/SPARC  framework  envisaged  this  ap- 
proach is  shown  by  page  vi  of  its  report 
where  it  says  "By  defining  the  persons 
with  access  to  the  database  management 
system  as  entities  of  interest,  it  is 
possible  to  directly  model  the  rules  of 
access  and  thus  provide  the  necessary 
access  control  at  the  level  of  the  con- 
ceptual schema.  Access  may  be  further 
limited  at  other  levels," 

2.  I  would  initially  imagine  security 
pol ic ies  being  defined  at  the  conceptual 
level.  This  would  include  overall 
government  regulations  under  which  the 
company  must  operate  that  are  relevant. 
For  example,  a  privacy-type  policy  might 
be  one  which  states  that  any  person  can 
see  the  data  stored  about  himself  in  the 
database.  Another  policy  might  be  the 
normal  DOD  security  policy  that  a  person 
might  have  the  appropriate  clearance  be- 
fore he  can  see  a  piece  of  classified 
data.  The  actual  implementation  of 
these  policies  might  be  specified  at  the 
internal  level  (e.g.,  how  the  system 
stores  the  fact  that  a  piece  of  data  has 
a  certain  classification,  or  that  a  user 
has  a  specific  clearance) . 

3.  I  view  the  separation  of  the  usual  data 
in  a  database  and  the  security  and  in- 
tegrity rules,  information  about  users 
which  these  rules  might  require,  etc., 
as  being  a  very  artificial  one.  I  see 
no  reason  why  I  should  not  be  able  to 
ask  the  DBMS  what  users  have  access  to  a 
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particular  piece  of  data  or  document, 
just  as  I  ask  what  suppliers  supply  part 
xyz.  Having  to  go  through  an  entirely 
independent  mechanism  to  access  this 
type  of  information  seems  both  unwieldy 
and  insecure.  (There  are  a  number  of 
other  types  of  "metadata"  which  I  might 
want  to  get  from  the  database  too,  such 
as  what  relationships  do  objects  x  and  y 
have  in  common.) 

4.  It  is  inappropriate  to  leave  access  con- 
trol to  external  schemas,  let  alone  to 
the  operating  system.  The  operating 
system  is  able  to  control  access  in 
terms  of  the  objects  it  knows  about. 
Generally  these  are  rather  coarse  ob- 
jects like  files.  The  database  system 
will  have  to  be  responsible  for  enforc- 
c  ing  access  rules  on  those  objects  which 

it  alone  has  defined  and  knows  about 
(while  the  operating  system  protects  the 
physical  representations  and  the  data- 
base management  system  itself)  .  While 
use  of  the  external  schema  facility  to 
define  "security  views"  is  one  way  of 
implementing  access  control  (and  one 
which  I  advocate  in  many  cases) ,  it  is 
not  the  only  way.  I  would  not  like  to 
see  any  assumptions  made  here  unduly 
constrain  users  in  how  they  implement 
their  access  control.  Moreover,  this 
appears  to  require  the  burying  of  the 
access  control  rules  in  the  mapping 
specifications  for  the  external  schemas, 
rather  than  having  them  explicitly  de- 
clared somewhere.     [M  anola] 

The  measurement  category  directs  the  DBMS  in  collecting 
data  about  database  use,  population,  etc.  Since  there  are 
no  syntactic  elements  of  the  1978  JOD  DDL  in  this  category, 
the  workshops  did  not  discuss  this  category. 

The  tun  ing  category  declares  guidelines  for  database 
organization  to  assist  in  tuning  database  performance.  The 
syntactic  elements  of  this  category  include  SOURCE  and 
RESULT  clauses,  the  PRIOR  processable  clause,  the  LOCATION 
mode  is  clause,  the  LINKED  to  clause,  and  the  SEARCH  clause. 
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There  was  considerable  discussion  of  this  class  of 
language  specification,  but  little  consensus.  The  main 
points  brought  out  by  the  discussion  were: 

-    Tuning  options  affect  practical     data  independence, 
but  not  theoretical  data  independence. 

It  is  important  to  distinguish  volume  of  data  from 
frequency  of  use  in  specifying  tuning  statements. 

Record  usage  statements  should  be  in  the  conceptual 
or  internal  schema  to  reflect  global  data 
usage--rather  than  the  data  usage  of  a  single  class 
of  applications  related  by  an  external  schema.  (See 
position  paragraph  following  this  list). 

Tuning  should  be  an  internal  schema  matter  and  use- 
ful only  to  the  DBMS  which  might  use  the  information 
to  modify  the  schema  automatically. 

Three  position  paragraphs  were  submitted  on  this  topic: 

From  a  formal  point  of  view,  a  tuning  declara- 
tion (such  as  frequencies  of  use  and  volumes  of 
conceptual  entities)  exposed  in  terms  of  concep- 
tual objects  is  part  of  the  declaration  of  the 
internal-conceptual  mapping  (since  it  affects 
how  the  conceptual  objects  will  be  represented 
at  the  internal  level).  From  the  point  of  view 
of  packaging  such  declarations  into  various 
languages,  it  will  probably  be  convenient  to 
package  them  in  the  conceptual  schema,  as  this 
is  a  central  point  for  collecting  global  de- 
clarations about  the  system  as  a  whole. 
[Manola] 

Performance  oriented  estimates/ proj ections 
should  include  frequency  and  volume  data.  They 
are  optionally  included  in  the  conceptual  schema 
expressed  in  terms  of  abstract  observable  pro- 
perties of  conceptual  schema  objects.  They  may 
be  of  use  to: 

internal  schema  designers 

automatic  internal  schema  generators 

DBMS  for  automatic  tuning  of  the  data- 
base in  ways  which  may  not  even  be  re- 
flected  in  the  internal  schema. 

Note  that  some  performance     related  information 
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could  be  provided  in  a  program  (i.e.,  in  EML  or 
external  schema) --e .g . ,  "I  intend  to  retrieve 
4000  employee  records"--which  will  cause  the 
DBMS  to  stage  the  data  to  high  speed  storage  or 
take  other  appropriate  action.     [ Lowe n thai ] 

The  SOURCE  clause  is  appropriately  categorized 
under  the  EXTERNAL  schema,  not  under  the  CONCEP- 
TUAL schema.  It  is  an  EXTERNAL  schema  component 
because  it  permits  construction  of  a  record  that 
contains  values  derived  from  relationships.  For 
example,  consider 


Figure  4 
SOURCE  clause  example 


DEPT 
ONAME 

Jl 

EMPL 
EDNAAAE 
ENAME 


EDNAME  has  as  SOURCE  DNAME.  This  permits  a  view 
(EXTERNAL  schema)  without  the  DEPT  record  and 
without  the  DE  set  in  which  an  employee's  dept 
name  (as  EDNAME)  is  still  available.  Contrary 
to  assertions  made  at  the  meeting  by  Frank  Mano- 
la  and  Charles  Bachman,  the  SOURCE  clause  does 
not  act  as  a  clause  associated  with  data  base 
integrity  (if  it  did  it  would  properly  be  asso- 
ciated with  the  conceptual  schema  as  well).  The 
JOD  specifically  states  that  SET  SELECTION  must 
be  used  to  accomplish  this  kind  of  integrity. 
SOURCE  allows  one  to  tell  the  DBMS — "replicate 
an  item  here  and  give  its  replication  a  new 
name."  Since  one  could  do  this  for  all  items  in 
records  that  own  sets  ( into  the  members)  without 
change  to  the  enterprise  model,  or  to  allowable 
data,  it  seems  not  to  be  associated  with  the 
conceptual  schema.     [  Gerr itsen] 
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The  resource  allocation  category  names  organizational 
units  appropriate  for  managing  system  resources,  and  con- 
trols the  assignment  of  occurrences  of  the  declared  data 
structures  to  those  units.  The  syntactic  elements  of  this 
category  include  the  AREA  NAME  clause  and  the  WITHIN  area 
clause . 

The  administration  category  names  and  provides  for  the 

invocation  of  DBA  supplied  procedures.  The  syntactic  ele- 
ments of  this  category  include  the  CALL  clause. 

The  workshop  noted  that  database  procedures,     like  all 

other  forms  of  user  exits,  reduce  portability.  In  that 
sense,  they  serve  to  reduce  data  independence. 


2.5  CONCLUSIONS 


The  group  agreed  to  the  following  basic  conclus ions; 

-    No  intensional  definition  of    data     independence  is 
possible . 

The  two  schema  and  the  three  schema  architectures 
provide  the  same  degree  of  theoretical  data  indepen- 
dence . 

The  three  schema  architecture  provides  a  greater  de- 
gree of  practical  data  independence. 
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3.      IMPLEMENTATION  ISSUES 


3.1  INTRODUCTION 


While  the  participants  in  the  first  workshop  had  been 
charged  with  the  examination  of  a  single  issue,  data  in- 
dependence, the  second  workshop  was  charged  with  four  large 
areas.  These  areas  relate  very  closely  to  implementation 
problems,  in  contrast  to  the  more  nearly  theoretical  con- 
cerns of  the  first  workshop.  These  areas  were  not  addressed 
in  any  detail  by  the  ANSI/SPARC  study  group  report,  nor  by 
the  latest  CODASYL  Journals  of  Developnent .  The  main  task 
of  the  second  workshop  was  to  find  the  pitfalls  in  the 
ANSI/SPARC  framework  and  pitfalls  in  the  CODASYL  JOD  specif- 
ications as  an  implementation  of  that  framework  in  the  con- 
text of  these  four  areas. 

Not  all  the  participants  agreed  that  the  CODASYL 
specifications  were,  or  could  become,  an  implementation  of 
the  ANSI/SPARC  framework.  The  following  positions  were  sub- 
mitted : 

Comparison  of  CODASYL  and  ANSI/SPARC 
Architectures .  I  take  strong  disagreement  with 
the  premise  that  CODASYL  78  represents  a  valid 
embodiment  of  the  A/S  three  schema  database  ar- 
chitecture. While  there  are  three  schemas  in 
CODASYL  78,  there  are  not  the  required  three 
schemas:  it  is  not  accurate  to  equate  the  DDL 
schema  with  the  conceptual  schema  nor  the 
subschema  with  external  schema.  The  DDL  schema 
is  not  quite  the  conceptual  schema.  Too  much  of 
the  internal  level  remains,  even  with  factoring 
out  pointers  and  indices  into  the  DSDL.  And  too 
much  of  what  belongs  at  the  conceptual  level 
remains  unspecified,  or  provided  only  through 
vaguely  specified  CALL  LOCK,  or  BY  PROCEDURE 
references.  Most  damaging  is  the  inadequacy  of 
the  subschema  facility  provided  by  CODASYL  71, 
which  the  revisions  of  78  did  not  address.  This 
cannot  be  considered  an  ANSI/SPARC  external 
schema  facility:  it  restricts  subschemas  to 
resemble  too  closely  DDL  schema,  thus  leaving 
conceptual  and  internal  constructs  visible  in 
the  user  interface.  This  is  not  a  small  objec- 
tion, nor  is  it  easily  fixed;  by  closely  linking 
subschema  data  access  to  DDL  schema  records  and 
sets,     serious     limitations     have  been  placed  at 
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the  level  and  power  of  available  IML  statements. 
This  in  turn  has  serious  implications  for  pro- 
gramming effort  and  productivity,  channel  traff- 
ic in  distributed  processing,  and  security  and 
integrity.     [  demons] 

Lefkovits  noted: 

The  proper  issue  to  be  discussed  here  is  not 
whether  CODASYL  78  is  an  adequate  representation 
of  the  ANSI/SPARC  architecture,  but  rather 
whether  CODASYL  78  sits  into  that  architecture. 
In  my  opinion  it  is  obvious  that  the  answer  to 
whether  one  can  equate  CODASYL  78  to  ANSI/SPARC 
must  be  a  categorical  negative;  for  example,  CO- 
DASYL 78  is  a  two  schema  structure  and  whereas 
there  is  nothing  in  CODASYL  78  that  inhibits  the 
introduction  of  a  conceptual  schema,  this  has 
not  been  done  to  date.  On  the  other  hand,  there 
is  nothing  in  CODASYL  78  which  appears  to  be  in- 
consistent with  the  ANSI/SPARC  architecture  and 
as  such,  CODASYL  78  does  not  inhibit  any  of  the 
benefits  that  can  be  accrued  from  this  architec- 
ture. [Lefkovits] 


3.2     DATABASE  ENVIRONMENT 


The  workshop  discussed  the  impact  of  current  hardware 
and  software  technology  research  on  future  database  prac- 
tice. The  hardware  developments  discussed  were  database 
distribution  and  specialized  database  machines  ( " backends" ) . 
The  software  developments  were  structured  programming,  modu- 
lar design,  and  abstract  data  types.     [LISK],    [GUTT] . 

3.2.1  Distributed  Databases. 

The  workshop  discussed  two  questions:  one  referencing 
ANSI/SPARC  and  one  referencing  CODASYL.  The  ANSI/SPARC 
framework  was  seen  by  some  workshop  participants  as  being 
well  suited  to  the  distributed  database  environment.  Others 
thought  that  the  ANSI/SPARC  report  implied  a  single  coherent 
database.  Database  distribution  was  said  to  be  a  problem 
similar  in  nature  to  the  earlier  problem  of  interrelated 
data  on  disparate  storage  media  and  of  differing  storage 
structure . 

Lefkovits  made  this  comment  on  the  incompleteness  of 
the  ANSI/SPARC  architecture: 
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The  ANSI/SPARC  architecture  does  not  address  it- 
self to  conventional  file  organization  and  dis- 
tributed data  (databases  and/or  files).  These 
two  subjects  are  similar  in  some  ways  as  they 
both  address  data  located  outside  a  single  data- 
base. There  is  nothing  in  the  current  document 
that  contradicts  such  future  extensions,  other 
perhaps  than  the  fact  that  the  role  of  the  data 
dictionary/directory  will  have  to  be  augmented 
substantially  to  reflect  these  changes.  In  an 
identical  manner  CODASYL  78  does  not  address 
these  subjects  either,  but  it  appears  that  the 
same  extensions  that  could  be  made  to  the 
ANSI/SPARC  architecture  could  be  included  in 
some  future  version  of  the  CODASYL  specifica- 
tions.      Lef  kov  its] 

The  ANSI/SPARC  conceptual  schema  represented  to  most  of 
the  workshop  a  good  vehicle  for  coordinating  separate  data 
models  and  DBMS,  but  distribution  itself  was  felt  to  have 
ramifications  throughout  the  framework.  It  was  not  isolated 
in,  for  example,  the  internal  schema  or  its  mappings. 

The  workshop  was  in  near  agreement  that,  in  its  current 
state  of  development,  the  CODASYL  specifications  could  not 
adapt  well  to  a  distributed  database  environment.  The  focus 
of  the  criticism  was  the  application  language  interface,  the 
EML.  Discussion  of  the  schema  DDL  went  little  beyond  an  ap- 
parent belief  that  it  presented  no  impediments  to  distribu- 
tion. No  grounds  for  such  a  belief  were  offered.  No  dis- 
cussion at  all  addressed  the  adaptability  of  the  DSDL  to  the 
description  of  the  distributed  environment. 

The  criticism  of  the  EML  was  that  it  was  too  "low  lev- 
el." This  criticism  was  the  strongest  result  of  the  discus- 
sion of  Database  Environment  and  reappears  in  other  areas. 
Therefore,  a  summary  of  the  definitions  and  arguments  and 
the  positions  submitted  by  individuals  have  been  collected 
and  appear  later  in  the  Database  Environment  area. 

Within  the  discussion  on  distributed  databases,  it  was 
pointed  out  that  minimization,  or  at  least  reduction,  of 
channel  or  network  traffic  (the  amount  of  data  transmitted 
between  nodes  of  a  distributed  data  network)  was  a  signifi- 
cant goal  of  any  distributed  DBMS,  since  the  cost  of  that 
traffic  may  well  be  the  limiting  factor  in  system  use.  The 
EML  of  CODASYL,  it  was  thought,  would  increase  that  traffic. 
The  global  intention  of  the  application  programmer  cannot  be 
expressed  in  the  CODASYL  DML.  This  leaves  no  alternative  to 
the  implementation  policy  of  retrieving  each  record  request- 
ed by  the  program  so  that  the  selection  and  aggregation 
(AVG,     MAX,     etc.)      logic     embedded     in     the     program  can  be 
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exercised.  A  higher  level  EML  would  allow  these  low  level 
operations  to  be  distributed.  Although  this  does  not  de- 
crease the  total  amount  of  computation  performed,  it  tends 
to  decrease  the  volume  of  data  transmitted.  The  amount  of 
decrease  depends  on  the  amount  of  selection  and  aggregation 
implied  by  the  query  and  by  the  amount  of  cross- node  compar- 
isons made.  (A  cross-node  comparison  references  data  items 
stored  at  distant  nodes  in  the  network.)  The  following  posi- 
tions were  submitted  on  this  issue: 

A  low-level  DML  (such  as  CODASYL  DBTG)  is  inap- 
propriate for  a  distributed  DBMS  because  the 
low-level  operators,  when  applied  to  remote 
data,  result  in  excessive  network  traffic.  It 
is  necessary  to  have  processes  at  the  remote 
system  to  select  and  aggregate  data  before 
transmission.  If  a  high-level  DML  is  used,  the 
system  can  construct  the  appropriate  processes 
to  be  run  at  the  different  sites  involved. 

The  alternative  of  having  the  programmer  code 
separate  low-level  processes  at  each  site  is  not 
appealing  since: 

1.  the  programmer  must  be  explicitly  aware 
of  data  location,  the  programs  cannot 
survive  data  reorganization; 

2.  the  coding  is  much  more  difficult  and 
must  concern  itself  with  intersite  com- 
munication; 

3.  the  low-level  code  must  be  bound  to  par- 
ticular sites.  If  a  site  fails,  even 
though  redundant  data  is  available  else- 
where, the  code  cannot  execute.  A 
high-level  global  program  could  automat- 
ically use  alternate  data  copies; 

4.  with  a  high-level  program,  the  compiler 
can  optimize  the  global  execution  based 
on  database  statistics  (not  possible 
with  a  low- level  implementation) . 
[ Shipman] 

The  subschema  facility,  by  retaining  a  close 
link  to  DDL  schema  records  and  sets,  places 
severe  limitations  on  possible  EML.  CODASYL  78 
represents  an  improvement  over  CODASYL  71:  for 
example,  set  selection  by  structural  constraint 
permits  returning  to  the  user  only  those  records 
whose  contents  are  of  interest;     still,  placing 
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this  feature  in  the  DDL  schema,  and  thus  requir- 
ing increased  implementation  overhead,  will  lim- 
it its  usefulness.  Most  other  possibilities  for 
reducing  the  volume  of  data 

transm itted--qual if ication ,  statistical  reduc- 
tion, aggregation--are  not  supported  by  DDL 
schema  to  subschema  mappings.  Thus  these  func- 
tions will  be  performed  by  most  language  pro- 
grams, requiring  that  the  necessary  data  be 
passed  to  the  user's  computer  for  processing, 
rather  than  being  reduced  before  transmission. 
This  results  in  a  substantial  increase  in  the 
volume  of  data  to  be  transmitted.  [demons] 

Some  argued  that  global  optimization  of  CODASYL  DML 
programs  would  be  capable  of  decreasing  the  volume  of  net- 
work traffic  which  might  otherwise  result.  However,  no  one 
indicated  how  that  optimization  might  proceed,  nor  gave  any 
reference  to  work  which  had  succeeded  in  finding  or  attempt- 
ed to  find  such  an  optimization  technique. 

3.2.2  Special  Purpose  Machines. 

Do  associative  memories  make  the  definition  of  access 
paths  like  CODASYL  sets  obsolete?  Does  the  navigational  DML 
make  the  use  of  associative  memories  impossible?  Which 
functions  of  a  DBMS  should  be  allocated  to  a  backend 
machine?  What  should  be  the  interface  between  the  backend 
and  the  host?  How  does  this  fit  into  the  ANSI  framework  and 
the  CODASYL  specifications?  This  series  of  questions  pro- 
voked a  discussion  on  the  means  by  which  the  specialized 
devices,  e.g.,  RAP  [OZKA],  CASSM  [COPE],  etc.,  could  be  in- 
corporated into  the  ANSI/SPARC  framework  or  into  the  CODASYL 
specifications.  The  term  "database  machine"  encompassed  a 
large  variety  of  specialized  non-numeric  processors  includ- 
ing associative  memories,  text  processors,  sorting  machines, 
and  so  forth.  The  associative  memories  under  discussion 
were  disk  based,  "logic  per  cell"  systems  which  respond  to 
requests  for  data  records  given  values  of  constituent  data 
items  rather  than  addresses,  and  which  are  capable  of  limit- 
ed logic  and  computation.  (The  eventual  appearance  of 
storage  technologies  other  than  disk,  specifically  bubble 
memories,  in  these  applications  was  noted  by  one  of  the  par- 
ticipants.) Associative  memories  are  examples  of  backend 
machines . 

The  workshop  noted  that  associative  memories  eliminated 
the  need  for  complex  stored  data  structures  such  as  inverted 
files,  indices,  and  so  forth.  This  function  of  the  CODASYL 
set  is  not  useful  in  an  associative  memory  environment.  As 
to  the  adaptability  of  the  CODASYL  Schema  DDL  or  data  model 
to    associative    memories,     opinions  differed.     Some  thought 
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that  the  relational  model  was  best  suited  to  these  devices. 
Su  reviews  these  arguments: 


Associative  memories  with  context  addressing  ca- 
pability are  more  suitable  for  the  relational 
model  than  for  the  other  models  because 


1.  the  relational  model  represents  entity 
relationships  by  explicitly  storing  the 
primary  keys  of  the  related  entities  (by 
content)  rather  than  the  addresses  of 
the  entities,  and 

2.  the  representation  of  relations  as  flat 
tables  matches  very  closely  to  the  ar- 
chitecture of  associative  memories-- thus 
the  effort  of  mapping  the  information 
structure  of  the  data  to  the  structure 
in  the  associative  memories  is  minimal. 


However,  one  should  not  jump  to  the  conclusion 
that  the  relational  model  is  superior  than  other 
models  for  the  following  reasons.  First,  the 
capability  of  associative  memories  have  not  yet 
been  fully  exploited.  With  some  hardware  exten- 
sion, associative  memories  can  handle  more  com- 
plex data  structures  such  as  trees  and  network 
efficiently.  For  example,  the  CASSM  system  is 
built  to  process  hierarchical  structures  and 
contains  pointer  transfer  capabilities  which  are 
very  suitable  for  data  structures  using 
pointers.  Second,  the  hardware  should  be  built 
and  modified  to  meet  the  application  needs. 
Hardware  consideration  should  not  be  a  major 
factor  for  determining  the  suitability  or  the 
superiority  of  data  models.  With  the  drastic 
reduction  of  the  hardware  cost,  one  can  expect 
that  new  hardware  (database  machines)  can  be 
built  to  support  any  data  model  which  the  users 
deem  suitable  for  their  applications.     [ Su] 

On  the  EML  question,  the  workshop  again  criticized  the 
CODASYL  specifications  as  being  too  low  level.  Where  some 
had  thought  that  optimization  could  be  of  assistance  in  the 
distributed  environment,  the  situation  was  thought  to  be 
worse  in  regard  to  associative  memories.  The  parallelism  of 
these  devices  can  only  be  exploited  through  the  use  of  a 
higher  level  EML.  When  there  are  "multiple  hits"  in  the  as- 
sociative memory,  it  was  thought  by  some,  the  precision  of  a 
navigational  DML  would  be  required.  This  was  taken  to  mean 
that  record- by-record  or  tuple- by- tuple  serial  operations 
are  a  required  part  of  any  DML.       Others    thought     that  the 
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ability  of  a  higher  level  DML  to  deal  with  a  set  rather  than 
an  individual  returned  as  the  result  of  a  query  was  a  power- 
ful argument  for  the  use  of  such  EMLs .  It  was  noted  that 
associative  memories  facilitate  the  implementation  and  im- 
prove the  performance  of  high  level  DMLs . 

The  participants  generally  thought  that  the  more  a 
backend  machine  did  in  off-loading  work  from  the  host  com- 
puter, the  better.  Clearly,  there  must  exist  some  limit  to 
this  off-loading,  otherwise  the  backend  simply  becomes  the 
host.  Most  of  the  comments  made  by  the  workshop  to  this 
point  appear  in  the  position  by  Su : 

Which  DBMS  functions  should  be  assigned  to  the 
main  frame  computer  and  which  to  the  backend 
processor{ s)  would  depend  heavily  on  such  fac- 
tors as  the  relative  speed,  memory  size,  job 
load,  peripheral  device  types  and  speeds,  etc. 
of  the  processors  involved.  However,  in  gen- 
eral, we  should  expect  the  main  frame  to: 

1.  carry  out  the  program  or  query  transla- 
tion task; 

2.  support  the  end-user  facilities 
( ANSI/SPARC) ; 

3.  handle  schema  mappings,  terminal  commun- 
ications, job  scheduling  and  formatting 
of  output  data  to  the  users; 

4.  control  the  program  execution;  and 

5.  construct  and  maintain  the  data  diction- 
ary. 

The  backend  processor( s)  should  perform  the  ac- 
tual data  retrieval  and  manipulation  operations, 
handle  the  memory  management,  schedule  and  con- 
trol the  query  or  DML  execution,  enforce  the  in- 
tegrity and  security  rules  and  implement  the 
data  aggregate  functions.  The  data  to  be  passed 
from  the  backends  to  the  main  processor  should 
be  the  result  of  a  high-level  data  operation  (a 
set  of  records)  rather  than  of  a  low-level  re- 
quest (one  record  at  a  time)  to  reduce  the 
amount  of  interference  generated  to  the  main 
processor.  [Su] 
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Goguen  added  this  comment: 

The  back-end  machine  concept  places  the  database 
management  function  on  a  separate  processor  with 
exclusive  access  to  the  database.  The  host 
machine  collects  data  management  requests  and 
transmits  these  across  an  interface  from  the 
host  to  the  back-end.  Status  and  results  are 
accepted  from  the  back-end  by  the  host  and  sent 
to  the  appropriate  application  programs. 

By  separating  the  data  management  functions  and 
placing  them  on  a  specialized  processor,  the  op- 
portunity exists  for  designing  a  high  level  log- 
^  ical     interface     between     the    host  and  back-end 

processors.  With  this  approach,  data  can  be 
transferred  between  different  host  processors 
without  reformatting  and  translation,  assuming 
the  back-ends  are  the  same.  Thus  data  conver- 
sion can  be  avoided. 

A  future  conceptual  schema  in  the  ANSI/X3/SPARC 
framework  could  provide  the  basis  for  a  Standard 
Data  Interchange  format  which  would  be  the  high 
level  logical  interface  between  the  host  and 
back-end  processors.  [Goguen] 

The  workshop  participants  appeared  to  agree  that  the 
ANSI/SPARC  framework  allowed  for  rapid  adoption  of  special- 
ized processors  as  they  become  available. 

3.2.3  Higher  Level  DML. 

Having  repeatedly  criticized  the  CODASYL  DML,  the 
workshop  saw  the  necessity  to  define  more  nearly  what  they 
meant  by  "high  level  DML."  A  first  attempt  was  that  in  a 
high  level  EML,  "you  tell  the  computer  what  you  want;" 
whereas,  in  a  low  level  EML,  "you  tell  the  computer  how  to 
get  what  you  want."  This  was  refined  to  mean  that  a  low- 
level  EML  was  procedural  or  navigational,  and  a  high  level 
DML  was  non- proced ur al  or  non-nav igational .  However,  these 
terms  did  not  appear  to  have  clearly  understood  definitions. 
The  following  positions  were  submitted: 

A  "navigational"  interface  to  a  database  manage- 
ment system  is  an  interface  which  accesses  one 
record  at  a  time  according  to  a  specific  access 
path;  e.g.,  "get  parent"  or  "find  next  in  set." 
A  " non-nav igational"  interface  tends  to  operate 
on  sets  of  records  and  to  be  independent  of  ac- 
cess path;  e.g.,  "join     sales     to     customers  by 
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matching  their  account  numbers."  Like  "pro- 
cedural," the  term  "navigational"  defines  a  con- 
tinuum on  which  I  would  characterize  the 
hierarchic  and  network  data  models  as  more  navi- 
gational, and  the  relational  data  model  as  less 
navigational,     [  Chamberlin] 

If  the  word  "navigation"  is  to  be  used  to  mean 
any  type  of  program-controlled  data  traversal, 
then  all  DMLs  are  navigational.  They  may  differ 
only  in  the  degree  of  "  nav  igational  ity ."  This 
is  because  all  DMLs  are  used  to  traverse  and 
manipulate  databases  in  accordance  with  the  in- 
formation structures  defined  in  the  external 
schemas.  However,  some  of  the  structural  pro- 
perties of  a  database  defined  in  an  external 
schema  are  syntactic  oriented  and  are  inherent 
from  the  specific  data  model  used.  They  are  not 
necessary  for  defining  the  semantics  of  the 
data.  For  example,  the  concepts  of  database 
key,  area  assignment,  chaining  and  sequential 
access  of  set  members,  etc.  have  more  to  do  with 
the  access  path  structure  of  the  data  than  their 
semantics.  If  we  distinguish  that  part  of  the 
structural  properties  defined  in  external  sche- 
mas which  are  necessary  for  describing  the  se- 
mantics of  databases  from  that  part  which  are 
not,  we  can  then  define  a  navigational  EML  as  a 
language  which  allows  the  user  to  guide  the  data 
traversal  through  the  excess  structural  proper- 
ties and  a  non- nav  igational  DML  as  one  which  al- 
lows the  user  to  retrieve  and  manipulate  data 
based  only  on  the  semantic  properties  of  the  da- 
tabase.    [  Su] 

It  is  difficult  to  establish  that  there  exists 
any  language  which  is  not  navigational  at  all; 
probably  the  fact  is  that  some  languages  are 
more  navigational  than  others.  A  potential 
measure  of  the  degree  to  v^ich  a  language  is  na- 
vigational is  the  following:  "Every  language 
statement  implicitly  contains  in  its  semantics  a 
data  structure  which  it  addresses;  the  closer 
this  data  structure  is  to  the  actual  schema 
structure,  the  higher  is  the  degree  to  v^ich  the 
language  is  navigational."    [Lef  kov  its] 

Rather  than  directly  address  level  of  EML,  which 
seems  to  be  an  ill-defined  and  elusive  concept, 
I  address  related  subtopics:  iteration  and  na- 
vigation. Iteration  in  DML  is  the  ability  to 
write  a  single  EML  statement  that  retrieves  and 
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processes  all  desired  record  occurrences.  For 
example,  when  retrieving  the  arithmetic  average 
of  salary  of  all  employees  of  a  department,  DML 
iteration  would  permit  this  to  be  done  without 
explicit  host  language  looping  and  arithmetic 
statements.  Although  such  facilities  are  not 
provided  in  the  current  CODASYL  DML,  they  could 
be  added  with  few  changes  to  DDL  schema  and 
subschema  facilities.  I  define  database  naviga- 
tion to  be  the  process  by  which  the  user  issues 
DML  statements  to  make  associations  between  or 
among  desired  records.  This  definition  is  in- 
dependent of  data  model  and  language  used — it 
applies  equally  to  navigation  based  on  set 
membership,  algebraic  joins  or  SEQUEL  mappings. 
Although  all  implementations  require  some  navi- 
gation, user  navigation  is  bad,  because  it  is 
time-consuming,  difficult,  and  likely  to  result 
in  errors.  The  alternative  is  replacing  EML  as- 
sociation statements  with  DDL  statements,  that 
is,  making  data  associations  and  accesses  part 
of  the  external  schema  definition.  This  is 
readily  done  as  an  extension  to  the  relational 
model,  as  shown  by  System  R  and,  to  a  greater 
extent,  my  own  research.  It  has  not  been  done 
by  CODASYL:  no  such  facilities  are  included  in 
the  subschema  facility.  Moreover,  extensions 
needed  to  accomplish  this  may  be  quite  complex 
due  to  the  basic  set  architecture  chosen;  the 
SELECTION  clause  of  CODASYL  78  DDL  demonstrates 
this.  [demons] 

Zaniolo  provided  an  overview  that  linked  several  of  the 
topics  in  order  to  reveal  additional  insights: 

The  ANSI/X3/SPARC  architecture  is  conducive  to 
data  independence,  powerful  end  user  facilities 
and  high  level  DMLs  suited  for  structured  pro- 
gramming. Its  framework  will  enable  a  DBMS  to 
respond  well  to  environment  changes  including 
distributed  data  and  database  machines.  The 
following  features  are  most  important  in  realiz- 
ing the  previous  benefits:  three  levels  of 
schemas ,  multi-model  external  schemas,  external 
schemas  derived  from  other  external  schemas, 
dictionaries.  While  CODASYL  78  uses  three  lev- 
els of  schemas  it  does  not  provide  the  remaining 
facilities  which,  I  believe,  are  necessary  to 
realize  the  benefits  previously  described.  I 
see  no  conceptual  problem  in  adding  a  dictionary 
facility  specification  to  the  CODASYL  78  docu- 
ments.    However,  the  addition  of    more  powerful 
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external  schema  facilities  could  produce  strong 
repercussions  in  the  schema  DDL  required  to  sup- 
port such  mapping  flexibility. 

The  objective  of  this  statement  is  not  to  pass 
judgment  on  the  relative  merits  of  CODASYL  78 
and  ANSI/X3/SPARC,  but  to  suggest  that  a  dis- 
tinction between  the  two  would  be  useful  in 
clarifying  the  discussion  and  the  conclusions  of 
this  workshop. 

HIGH  LEVEL  DMLs;  These  languages  are  very  useful 
since  they  facilitate  application  programming 
and  promote  practical  data  independence.  The 
discussion  at  the  workshop  focused  on  two  as- 
pects of  high  level  DMLs: 

i.  Aggreg ate  Operation  A s pe c t .  This  refers 
to  the  capability  of  manipulating  and 
retrieving  a  whole  set  of  records, 
selected  on  the  basis  of  their  content, 
in  a  single  DML  statement. 

ii.  Non-Navigational  Aspect.  (The  defini- 
tion of  navigational  languages  is  given 
below) . 


Current  query  languages  (e.g.  the  QLP  language 
for  EMS/1100  of  Sperry  Univac)  support  aggregate 
operations  against  network  schemas.  However, 
non-nav  igational  EMLs  will  probably  require 
external  schemas  different  from  the  network 
subschemas  of  CODASYL  78. 


NAVIGATIONAL  LANGUAGES;  These  are  languages 
based  upon  the  availability  of  access  paths  in 
the  application  schema.  Thus  the  user  identi- 
fies paths  and  records  of  interest  and  specifies 
his  query  as  steps  thru  them.  While  access 
paths  can  be  defined  as  purely  logical  con- 
structs, they  have  traditionally  been  regarded 
as  expression  of  underlying  physical  structures, 
thus  impairing  data  independence.  Moreover  a 
query  on  a  relationship  not  directly  supported 
by  an  access  path  is  expressed  differently  and 
less  conveniently  than  a  query  on  a  supported 
relationship.     [Z  aniolo] 

The  participants  agreed,  as  implied  by  Chamberlin' s 
statement,  that  a  high  level  DML  takes  as  its  operands  and 
returns  as  its  results  aggregates  of  records,  tuples,  enti- 
ties,    relationships,     or  whatever,  rather  than  individuals. 
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Although  no  consensus  was  reached  on  how  non-procedural ity 
is  recognized,  however,  it  was  agreed  that  procedural ity  and 
navigation  were  matters  of  degree.  Some  participants  felt 
that  insofar  as  the  means  of  establishing  a  relationship 
( CODAS YL  sets  or  relational  joins)  between  objects  (records 
or  tuples)  appeared  in  the  EML,  the  EML  was  navigational. 
This  led  to  the  position,  not  submitted  as  a  statement,  that 
the  hierarchical  model  allowed  for  the  least  navigational 
DML.  The  workshop  did  not  agree  as  to  whether  the  choice  of 
data  model  dictated  the  degree  of  navigation  or  procedural  i- 
ty  on  an  associated  DML.  Some  participants  thought  that  a 
non-procedural ,  non- nav  igational  EML  for  use  with  a  CODASYL 
organized  database  was  conceivable;  ethers  did  not.  Dave 
Shipman  took  what  might  be  called  a  higher  level  view  of 
higher  level  EMLs: 

What  is  a  H ig h-Level  Language?  A  higher- level 
language  is  one  in  which  the  contents  of  the 
language  more  closely  resemble  the  terms  in 
which  the  user  thinks  of  his  problem.  This  is 
accomplished  by  embedding  the  semantics  of  the 
problem  domain  within  the  programming  language 
itself,  in  terms  of  DBMS  languages.  This  means 
that  programs  in  a  high-level  EML  express  the 
intent  of  the  query  or  update,  rather  than  the 
sequence  of  low-level  operations  needed  to  carry 
it  out.  A  high-level  EML  pre-supposes  a  DDL  fa- 
cility which  allows  the  semantics  of  the  appli- 
cation to  be  reflected  in  data  descriptions 
rather  than  in  each  DML  program  which  references 
those  descriptions.  A  high-level  DDL  in  turn 
incorporates  constructs  natural  to  modelling  the 
real  world  enterprise.  High  level  languages  are 
easier  to  write,  debug,  document,  maintain,  and 
administer.  In  addition,  because  a  high-level 
language  is  less  bound  to  any  particular 
hardware  or  software  implementation,  new  techno- 
logies can  be  absorbed  without  significant  im- 
pact on  existing  application  programs. 
[ Shipman] 

3.2.4  Advanced  Programming. 

Do  either  the  ANSI/SPARC  framework  or  the  CODASYL 
specification  inhibit  database  programmers  in  the  use  of  the 
methodology  known  as  structured  or  modular  programming?  Can 
the  results  of  research  into  abstract  data  types  be  applied 
to  database  problems? 
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These  questions  attempted  to  elicit  the  participants' 
opinions  of  recent  advances  in  software  engineering  that 
have  some  impact  on  database  engineering.  Discussion  of  the 
topic  was  brief.  It  was  felt  that  the  ANSI/SPARC  framework 
"blends  nicely  into  the  ideas  of  abstract  data  type"  but 
there  were  no  further  details  forthcoming.  Concerning  the 
CODASYL  specifications,  it  was  remarked  that  their  complexi- 
ty did  not  lead  to  elegant  programming  style. 

A  number  of  the  participants  felt  the  need  for  database 
operations  defined  on  the  semantics  of  the  data  and  that  the 
work  in  abstract  data  types  could  be  used  as  an  example. 
Consider  the  contrast  between  the  commands  STORE  EMPLOYEE 
RECORD  and  HIRE  EMPLOYEE.  The  first  is  expressed  in  terms 
of  the  data  requirements  of  the  DBMS.  The  second  is  ex- 
pressed in  the  semantics  of  the  enterprise.  The  second  com- 
mand might  include  side  effects  such  as  establishing  rela- 
tionships ( Department- Employee)  ,  updating  counts  (Number  of 
Employees  in  Department),  and  so  forth. 


3.3     END  USER  FACILITIES 


The  CODASYL  specifications  to  date  have  not  addressed 
the  end  user's  needs.  The  ANSI/SPARC  study  group  identified 
four  interfaces  for  the  end  user,  but  did  not  specify  them 
in  detail . 

3.3.1  Is  the  ANSI  user  analysis  correct?. 

The  ANSI/SPARC  report  recognizes  four  separate  inter- 
faces, (numbered  8  through  11),  for  the  end  user.  They  are 
the  Inquiry  Processor  (small  output  volume,  interactive); 
Report  Writer  (large  output  volume,  batch);  Update  Proces- 
sor; and  Parametric  interface. 

The  majority  of  the  participants  considered  this 
analysis  to  be  incorrect.  Don  Chamberlin  points  out  in  the 
following  statement  that  the  application  programmer  inter- 
face, interface  7,  is  presented  as  being  parallel  to  the  end 
user  interfaces,  which  he  considers  an  error: 

Role  of  Appl icat ion  Programming  Interface     in  a 
DBMS 

The  ANSI/SPARC  framework  for  a  DBMS  [ANS77] 
treats  the  application  programming  interface 
(API)  as  independent  of  the  query,  report  gen- 
erator, and  several  other  interfaces  (Figure  5). 
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An  alternative  which  has  several  advantages  is 
to  make  the  API  sufficiently  rich  to  enable  pro- 
grams to  be  written  in  support  of  query,  report 
generation,  etc.  (Figure  6). 


Figure  6 
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Such  an  enriched  API  would  need     facilities  for 


submitting  ad- hoc  queries  and  updates  for  execu- 
tion (e.g.,  the  EXECUTE  statement  of  SQL 
[CHAM]).  A  system  structured  according  to  Figure 
6  permits  a  variety  of  query  and  report  genera- 
tor interfaces  to  be  developed  and  supported  by 
programs,  to  meet  a  variety  of  end  user  require- 
ments.    [ Chamberlin] 

Other  participants  expressed  the  view  that  the  four 
separate  interfaces  of  the  ANSI/SPARC  diagram  represented  a 
data  processing  solution,  in  sharp  contrast  to  a  user 
oriented  solution.  In  other  words,  these  four  operations 
appear  separate  and  distinguishable  to  a  computer- 
-cnowledgeable  data  processing  professional,  but  not  to  an 
end  user. 

Henry  Lefkovits  drew  a  diagram  (Figure  7)  for  the 
design  of  an  End  User  Facility  or  Facilities.  The  diagram 
results  from  the  work  of  the  End  User  Committee  of  CODASYL 
of  which  Dr.  Lefkovits  is  the  chairman.  Although  thc-t  com- 
mittee has  yet  to  publish  a  jDurnal  of  Development,  their 
interim  report  [EUFTG]  indicated  that  they  were  developing 
specifications  for  a  facility  based  on  the  "forms  approach." 
Explanation  of  this  approach,  in  v^ich  the  database  is 
viewed  as  a  "virtual  file  cabinet"  within  v^ich  the  data  is 
organized  into  forms  resembling  the  familiar  paper  forms  on 
which  business  has  so  long  relied,  led  the  workshop  to 
renewed  criticism  of  parts  of  the  CODASYL  JOD  specifica- 
tions. This  discussion  is  given  in  the  answer  to  the  next 
question  presented  to  the  workshop. 
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Figure  7 
Design  of  an  End  User  Facility 
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3.3.2  Does  subject  architecture  impede  EUF? . 

Many  participants  agreed  that  the  subject  architecture 
(the  three  schema  framework  of  ANSI/SPARC  implemented  in  the 
CODASYL  78  language)  presented  impediments  to  the  construc- 
tion or  use  of  end  user  facilities. 

The  need  recognized  by  the  End  User  Committee  to  define 
an  end  user  view  of  the  database  reinforced  the  beliefs  held 
by  many  of  the  workshop  participants  that  the  CODASYL  COBOL 
Subschema  DDL  was  not  sufficient  to  serve  the  needs  of  a 
complete  external  schema  defining  language.  It  was  stated 
by  one  of  the  participants  that  the  "subschema  facility  ... 
is  fundamentally  misdesigned  [and]  entirely  inappropriate" 
for  use  as  an  external  schema  facility.  Don  Chamberlin  sum- 
marized  the  views  of  many  in  the  following  statement: 

1.  CODASYL  subschema  DDL  is  not  a  very  rich 
facility  for  defining  views  of  stored 
data.  Essentially,  the  defined  views 
must  be  simple  subsets  of  the  schema. 

2.  Because  CODASYL  DDL  lacks  the  "closure" 
property,  it  does  not  permit  "cascading" 
of  views  (views  defined  on  top  of 
views)  .     [ Chamber 1  in] 

The  ability  to  cascade  was  seen  by  some  as  more  a 
matter  of  convenience  than  necessity:  the  final  view  or 
subschema  or  external  schema  could  be  defined  directly  from 
the  schema.  However,  it  was  observed  that  forming  views  on 
views  increased  security  and  access  control,  and  provided  a 
more  flexible  division  of  labor  between  enterprise  and  ap- 
plication administrators,  as  those  roles  are  defined  by 
ANSI/SPARC. 

Some  debate  addressed  whether  a  sufficiently  powerful 
external  schema  facility  to  support  most  users'  needs  could 
be  designed  to  interface  with  the  CODASYL  schema  DDL,  with 
the  majority  apparently  believing   that  it  could. 

It  was  remarked  that  neither  the  ANSI/SPARC  nor  CODASYL 
documents  made  any  mention  of  text  processing  nor  of  message 
management  (communicating  among  end  users)  and  that  these 
were  important  features  of  an  end  user  facility. 

3.3.3  Is  EUF  different  from  programming  interface? 

The  participants  were,  for  the  most  part,  convinced 
that  end  user  facilities  did  not  present  any  hinderences  to 
database  evolution  beyond  those  presented  by  application 
programs.     It     was     pointed     out,  however,  that  the  end  user 
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environment  is  likely  to  be  less  controlled  than  the  data 
processing  environment.  Ad  hoc  database  use  may  create  more 
unpleasant  "surprises"  than  might  occur  in  the  absence  of 
end  user  interfaces.  Good  intra-enterpr ise  communication 
and  special  attention  to  the  preservation  of  end  user  views 
are  two  ways  to  deal  with  surprises. 


3.4     DATA  DICTIONARIES 


In  the  center  of  the  block  diagram  published  by  the 
ANSI/SPARC  study  group  there  is  an  unlabeled  triangle  from 
which  emanate  a  large  number  of  the  interfaces  recognized 
and  described  by  that  group.  This  triangle  is  the  data  dic- 
tionary, and  its  role  is  apparently  crucial  to  the 
ANSI/SPARC  framework.  The  study  group  report  gives  no  de- 
tails of  a  dictionary  system  beyond  the  image  of  a  reposito- 
ry for  whatever  control  information  is  needed  by  the  DBMS. 

Recent  work  by  the  British  Computer  Society  Data  Dic- 
tionary Systems  Working  Party  [BCS]  presents  an  elaborate 
and  powerful  facility  which  they  called  "the  data  processing 
department's  own  database."  Members  of  the  workshop  were 
quick  to  point  out  that  the  data  dictionary/directory  (DD/D) 
should  serve  not  only  the  data  processing  department  but  the 
larger  user  community  as  well.  They  also  criticized  the 
ANSI/SPARC  report  for  failing  to  see  that  the  dictionary  in- 
terfaced to  the  human  users  of  the  database.  Most  of  the 
workshop  participants  agreed  that  a  data  dictionary  system 
had  two  distinct  aspects:  one  oriented  towards  use  by  the 
DBMS;  and  one  oriented  towards  use  by  human  beings.  The  na- 
ture of  the  data  stored  in  a  DD/D  and  the  services  performed 
by  it  reflect  this  dual  role. 

3.4.1  What  is  the  data  stored   in  a  DD/D?. 

Everything  about  the  data  stored  or  storable  in  the  da- 
tabase of  interest  to  the  enterprise.  Everything  except  the 
data  values  and  programs  themselves.     As  concrete  examples: 

The  three  schemas  and  their  associated  mappings  in' 
source  and  object  form. 

Security,  access  control,  authorization  mechanisms. 

Usage  information.     This  was  of  two     types:  static] 
information  as  to  which  data  items  are  used  by  which^ 
programs;  dynamic  information     concerning  frequency 
and  volume  of  use. 
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-  Validation:     range  checks,  permissible  values,  etc. 

-  Narrative  text  describing     data     items,  aggregates, 
etc . 

-  An  example  of  human  oriented  data  not  of  interest  to 
the  DBMS. 

An  argument  was  made  that  the  system  being  described  by 
this  list  was  broad,  too  all  encompassing.  It  seemed  the 
system  would  "do  everything  for  everybody"  although  none  had 
described  how  it  would  manage.  Of  the  DBMS-oriented  data, 
it  was  argued  that  since  any  functioning  DBMS  kept  that  in- 
formation, calling  it  a  data  dictionary  served  no  new  pur- 
pose. The  reply  was  that  the  processing  of  this  information 
for  results  not  derivable  from  any  single  piece  of  informa- 
tion is  a  new  service  provided  by  DD/D  systems. 

3.4.2  Languages  used  with  a  DD/D  system. 

Three  distinct  languages  were  proposed.  They  were  a 
customizing  language,  a  maintenance  language,  and  a  report- 
ing language.  This  analysis  was  accepted  by  the  workshop 
with  the  following  criticisms  and  clarifications: 

The  customizing  language  need  be  no  more  complex  than 
the  options  selected  at  system  generation.  The  only  custom- 
izing decisions  to  be  made  are  the  choices  of  optional  sys- 
tem services. 

Reporting  from  the  dictionary  is  conceptually  no  dif- 
ferent than  reporting  from  the  database.  The  same  facility 
can  be  used  to  do  both.  This  prompted  a  discussion  on  the 
merits  of  organizing  the  dictionary  as  a  database  under  a 
IBMS.  On  the  one  hand,  it  was  felt  that  the  dictionary  is 
information,  and  a  database  is  a  box  for  storing  informa- 
tion. It  seems  reasonable  to  store  the  information  about 
the  database  in  the  box.  On  the  other  hand,  some  partici- 
pants perceived  a  marginal  increase  in  security  if  the  dic- 
tionary were  not  stored  under  the  DBMS.  The  workshop  was 
reminded  that  some  functions  of  the  dictionary  were  of  use 
in  the  absence  of  a  E8MS.  Most  workshop  members  thought  the 
use  of  the  IBMS  to  store  the  dictionary  a  sensible  decision. 

Maintenance  of  the  dictionary,  it  was  agreed,  differed 
substantially  from  maintenance  of  the  database.  Two  causes 
of  this  difference  appeared: 

-  Some  of  the  maintenance  is  done  by  the  system  it- 
self. Examples  of  this  include  dynamic  usage  infor- 
mation as  previously  mentioned.  Other  processes  of 
the    system    may    maintain    some     information  in  the 
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dictionary.  The  workshop  did  not  discuss  the  impor- 
tance of  automatic  versus  manual  data  collection  in 
the  DD/D. 


-  The  effects  of  changes  in  the  dictionary  are  poten- 
tially much  greater  than  the  effects  of  changes  in 
the  database.  There  is  a  considerable  difference 
between  adding  a  domain  to  the  description  of  a  re- 
lation, and  adding  a  tuple  to  the  extension  of  a  re- 
lation. It  is  correct,  therefore,  to  restrict 
maintenance  of  the  dictionary  to  be  a  single,  con- 
trollable facility. 

3.4.3  Services  performed  by  the  DD/D. 

The  participants  suggested  several  basic  DD/D  services: 

-  Resolving  name  conflicts.  These  are  of  two  kinds: 
two  (or  more)  names  for  the  same  thing;  and  the  same 
name  for  two  (or  more)  different  things.  It  is  the 
effort  of  collecting  this  information  for  the  dic- 
tionary which  recognizes  these  conflicts. 

Generate  data  descriptions:  COBOL  FD's,  PL/1  data 
structures,  etc.  All,  or  nearly  all,  dictionary 
products  on  the  market  perform  this  service. 

Query  cost  estimation  and  access  path  selection. 
The  data  needed  for  this  service  includes:  Availa- 
bility and  selectivity  of  indices  (the  selectivity 
of  an  index  is  the  number  of  unique  key  values  di- 
vided by  the  number  of  objects  indexed);  the  cardi- 
nality of  the  stored  relations;  and  the  average 
chain  length.  The  workshop  did  not  define  the  con- 
ditions under  which  the  cost  of  maintaining  this 
data  outweighs  its  usefulness. 

Audit  trails.  Not  to  be  confused  with  the 
update/ change  logs  kept  for  database  backup  and 
recovery.     Two  types  of  audit  trails  were  discussed: 

1.  a  list  of  which  programs  are  capable  of  modify- 
ing which  data  items.  This  can  be  very  useful 
for  tracking  down  the  source  of  improper  data- 
base updates;  and 

2.  a  list  of  the  changes  made  to  database  programs 
in  production  and  development  systems. 

Impact  of  proposed  changes.  A  list  cross  referenc- 
ing the  data  items  (attributes,  entities,  and  rela- 
tionships) and  the  programs  which  access  them.  This 
can  help  the  database  administrator  avoid  mistakes. 
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-  Data  Dictionaries. 

Use  of  a  data  dictionary  can  be  a  signifi- 
cant factor  in  data  conversion.  The  data 
dictionary  provides  a  catalog  of  the  data, 
both  on  the  item  level  and  in  terms  of  as- 
sociations amongst  the  data.  Assuming  that 
a  complete  data  description  is  included  in 
the  data  dictionary,  the  data  conversion 
system  can  use  the  information  for  generat- 
ing the  appropriate  source  and  target  data 
descriptions  required  for  the  conversion. 
A  major  problem  in  conversions  to  date  has 
been  the  unavailability  of  this  level  of 
description  in  usable  form.  The  definition 
has  been  implicit  in  the  logic  of  the  ap- 
plication programs  or  the  data  structures, 
requiring  time  consuming  analysis  to  gen- 
erate the  data  description.  The 
ANSI/X3/SPARC  architecture  should  include 
provision  for  a  data  dictionary  facility. 
[ Goguen] 

-  DBMS  were  introduced  to  administer  and  control  data 
which  had  until  then  been  allowed  to  accumulate  in 
unmanaged  file  systems.  DBMS  succeeded  in  being 
data  management  systems  in  only  a  narrow  sense. 
They  manage  to  control  and  integrate  program  access 
of  the  data.  Data  dictionary  systems  are  now  being 
introduced  to  administer  and  control  database 
management  systems. 

In  summary,  the  areas     in    which     a    dictionary  system 
serves  a  function  were  seen  to  be  fourfold: 

-  Operational:  the  support  of  production  systems 
through  security,  validation,  and  audit  trails. 

System  development:  the  support  of  application  pro- 
grams under  development. 

-  System  installation:  can  be  used  to  hold  the  "before 
and  after"  descriptions  of  databases  being 
translated/converted . 

-  Conceptual  Schema  or  Database  Design:  useful  in  all 
stag  es . 
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3 . 5  TRANSLATION/CONVERSION 


The  established  user  of  a  DBMS  will  have  invested  a 
considerable  sum  in  the  data  and  the  programs  which  access 
and  maintain  it.  Inevitably,  changes  in  all  areas  affecting 
the  DBMS  will  threaten  that  investment.  These  changes  come 
from  the  enterprise  itself,  from  advancement  in  technology, 
or  from  growth.  The  object  is  to  minimize  the  dollar  cost 
of  these  changes. 

Two  approaches  can  be  distinguished.  These  are  tenta- 
tively called  the  physical  and  the  logical  approach.  ("Phy- 
sical" is  not  to  be  taken  as  a  synonym  for  " brute- force"  nor 
is  "logical"  to  be  confused  with  "correct.")  The  physical 
approach  is  the  actual,  physical  translation  of  the  data 
from  the  old  format  to  the  new,  and  the  actual  rewriting  of 
the  programs  to  the  new  environment.  The  logical  solution 
is  the  provision  of  new  schema  to  schema  mappings  which 
preserve  unmodified  the  old  view  of  the  data.  It  may  be 
said  that  the  cost  of  the  physical  solution  is  paid  all  at 
once  whereas  the  cost  of  the  logical  solution  is  paid  over 
time . 

3.5.1  Database  Translation. 

Does  the  ANSI/SPARC  framework  or  the  CODASYL  specifica- 
tions inhibit  the  task  of  data  translation?  Specifically, 
research  into  data  translation  has  shown  the  need  for  Stan- 
dard Data  Interchange  Format  (SDIF)  [NBS].  How  does  this 
fit   into  ANSI/SPARC  or  CODASYL? 

The  ANSI/SPARC  framework  was  thought  an  excellent  vehi- 
cle for  data  translation.  The  SDIF  was  identified  with  the 
conceptual  schema.  This  began  a  discussion  of  the  require- 
ments of  an  SDIF.  Like  the  conceptual  schema,  the  SDIF 
should  describe  the  semantics  of  the  data  explicitly.  A  set 
of  operations  on  the  semantic  constructs  of  the  SDIF  should 
be  defined.  The  ANSI/SPARC  report  was  criticized  for  not 
providing  for  these  operations.  The  transformations  which 
bring  the  source  database  expressed  in  the  SDIF  into  the 
target  database  is  described  by  an  expression  in  these 
operators . 

There  was  some  argument  that  this  approach  was  a  case 
of  "overkill."  The  complete  semantics  of  the  data  are  not 
required  in  the  specification  of  those  formal  transforma- 
tions needed  to  modify  data  structure.  A  semantically  com- 
plete conceptual  schema  facility  might  facilitate  data 
translation,  but  it  is  not  necessary. 
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The  CODASYL  specifications  were  felt  by  most  workshop 
participants  to  be  inadequate  to  play  the  role  of  an  SDIF. 
On  the  one  hand,  too  much  physical  detail  is  present.  On 
the  other  hand,  not  enough  semantic  detail  is  present.  A 
number  of  alternative  means  of  representing  semantics  were 
discussed,  of  which  the  Entity-Relationship  model  of  Peter 
S.   Chen  was  considered  by  some  to  be  appropriate. 

3.5.2  Program  Conversion. 

The  need  for  a  description  of  the  intention  of  the  pro- 
grams under  conversion,  expressed  at  a  high  level,  was  not- 
ed. This  brought  renewed  criticism  of  the  CODASYL  DML.  Su , 
based  on  recent  work  [SULO] ,  drew  the  diagram  in  Figure  8. 

3.5.3  Dynamic  Conversion. 

The  workshop  participants  had  already  criticized  the 
CODASYL  subschema  as  inadequate  as  an  external  mapping  fa- 
cility. They  did  not  comment  on  the  feasibility  of  ir\ternal 
mappings  which  would  allow  a  DBA  to  choose  not  to  undergo 
physical  data  translation.  They  felt  that  internal  mappings 
should  be  kept  simple  since,  unlike  external  mappings,  they 
must  be  invertible.  Where  a  given  external  view  may  not  al- 
low update,  changes  to  the  conceptual  database  must  always 
be  reflected   in  the  physical  database. 

An  approach  called  "dynamic  restructuring"  [GERR]  which 
combines  aspects  of  the  physical  and  logical  approaches  was 
mentioned.  Stored  records  are  marked  with  a  generation 
number.  Records  which  are  retrieved  bearing  an  outdated 
generation  number  are  converted  to  the  latest  format  before 
being  stored. 
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Figure  8 
Conversion  Processes 
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3.6  CONCLUSIONS 


The  group  reached  the  following  general  conclusions: 

1.  The  ANSI/SPARC  architecture  was  well  suited  for  pro- 
viding data  independence  with  respect  to  anticipated 
technological  development  in  computer  hardware  and 
so  f tware . 

2.  Development  of  end  user  facility  does  not  hinder 
data  independence, 

3.  A  data  dictionary  properly  used  provides  a  measure 
of  data  independence  with  any  DBMS  architecture. 
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